{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<style>\n",
    "pre {\n",
    " white-space: pre-wrap !important;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(odd) {\n",
    "    background-color: #f9f9f9;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(even) {\n",
    "    background-color: white;\n",
    "}\n",
    ".table-striped td, .table-striped th, .table-striped tr {\n",
    "    border: 1px solid black;\n",
    "    border-collapse: collapse;\n",
    "    margin: 1em 2em;\n",
    "}\n",
    ".rendered_html td, .rendered_html th {\n",
    "    text-align: left;\n",
    "    vertical-align: middle;\n",
    "    padding: 4px;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Machine Learning (advanced): the Titanic dataset\n",
    "\n",
    "In the following is a more involved machine learning example, in which we will use a larger variety of method in `veax` to do data cleaning, feature engineering, pre-processing and finally to train a couple of models. To do this, we will use the well known _Titanic dataset_. Our task is to predict which passengers are more likely to have survived the disaster. \n",
    "\n",
    "Before we begin, thare there are two important notes to consider:\n",
    " - The following example is not to provide a competitive score for any competitions that might use the _Titanic dataset_. It's primary goal is to show how various methods provided by `vaex` and `vaex.ml` can be used to clean data, create new features, and do general data manipulations in a machine learning context. \n",
    " - While the _Titanic dataset_ is rather small in side, all the methods and operations presented in the solution below will work on a dataset of arbitrary size, as long as it fits on the hard-drive of your machine.\n",
    " \n",
    "Now, with that out of the way, let's get started!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:36.131498Z",
     "start_time": "2020-01-14T15:31:34.307532Z"
    }
   },
   "outputs": [],
   "source": [
    "import vaex\n",
    "import vaex.ml\n",
    "\n",
    "import numpy as np\n",
    "import pylab as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adjusting `matplotlib` parmeters\n",
    "\n",
    "_Intermezzo:_ we modify some of the `matplotlib` default settings, just to make the plots a bit more legible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:36.137276Z",
     "start_time": "2020-01-14T15:31:36.133106Z"
    }
   },
   "outputs": [],
   "source": [
    "SMALL_SIZE = 12\n",
    "MEDIUM_SIZE = 14\n",
    "BIGGER_SIZE = 16\n",
    "\n",
    "plt.rc('font', size=SMALL_SIZE)          # controls default text sizes\n",
    "plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title\n",
    "plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels\n",
    "plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels\n",
    "plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels\n",
    "plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize\n",
    "plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First of all we need to read in the data. Since the _Titanic dataset_ is quite well known for trying out different classification algorithms, as well as commonly used as a teaching tool for aspiring data scientists, it ships (no pun intended) together with `vaex.ml`. So let's read it in, see the description of its contents, and get a preview of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:36.306073Z",
     "start_time": "2020-01-14T15:31:36.139244Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.vaex-description pre {\n",
       "          max-width : 450px;\n",
       "          white-space : nowrap;\n",
       "          overflow : hidden;\n",
       "          text-overflow: ellipsis;\n",
       "        }\n",
       "\n",
       "        .vex-description pre:hover {\n",
       "          max-width : initial;\n",
       "          white-space: pre;\n",
       "        }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div><h2>titanic</h2> <b>rows</b>: 1,309</div><div><b>path</b>: <i>/Users/jovan/PyLibrary/vaex/packages/vaex-core/vaex/ml/datasets/titanic.hdf5</i></div><div><b>Description</b>: file exported by vaex, by user jovan, on date 2019-07-04 11:02:26.996867, from source /has/no/path/pandasprevious description:\n",
       "\n",
       "The Titanic dataset. \n",
       "A classic dataset used in many data mining tutorials and demos. \n",
       "Perfect for exploratory analysis and building binary classification models to predict survival.\n",
       "\n",
       "Data covers passengers only, not crew.\n",
       "\n",
       "Column description:\n",
       "pclass = passenger class (1 = 1st; 2 = 2nd; 3 = 3rd)\n",
       "survived = Survival (False = No; True = Yes)\n",
       "name = Name\n",
       "sex = Sex\n",
       "sibsp = Number of Siblings/Spouses Aboard\n",
       "parch = Number of Parents/Children Aboard\n",
       "ticket = Ticket Number\n",
       "fare = Passenger Fare\n",
       "cabin = Cabin\n",
       "embarked = Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)\n",
       "boat = Lifeboat (if survived)\n",
       "body = Body number (if did not survive and body was recovered)\n",
       "home_dest = Passenger destination\n",
       "</div><h2>Columns:</h2><table class='table-striped'><thead><tr><th>column</th><th>type</th><th>unit</th><th>description</th><th>expression</th></tr></thead><tr><td>pclass</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>survived</td><td>bool</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>name</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>sex</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>age</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>sibsp</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>parch</td><td>int64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>ticket</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>fare</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>cabin</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>embarked</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>boat</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>body</td><td>float64</td><td></td><td ><pre></pre></td><td></td></tr><tr><td>home_dest</td><td>str</td><td></td><td ><pre></pre></td><td></td></tr></table><h2>Data:</h2><table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                           </th><th>sex   </th><th>age   </th><th>sibsp  </th><th>parch  </th><th>ticket  </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                      </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>1       </td><td>True      </td><td>Allen, Miss. Elisabeth Walton                  </td><td>female</td><td>29.0  </td><td>0      </td><td>0      </td><td>24160   </td><td>211.3375</td><td>B5     </td><td>S         </td><td>2     </td><td>nan   </td><td>St Louis, MO                   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>True      </td><td>Allison, Master. Hudson Trevor                 </td><td>male  </td><td>0.9167</td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>11    </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>1       </td><td>False     </td><td>Allison, Miss. Helen Loraine                   </td><td>female</td><td>2.0   </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>1       </td><td>False     </td><td>Allison, Mr. Hudson Joshua Creighton           </td><td>male  </td><td>30.0  </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>None  </td><td>135.0 </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>1       </td><td>False     </td><td>Allison, Mrs. Hudson J C (Bessie Waldo Daniels)</td><td>female</td><td>25.0  </td><td>1      </td><td>2      </td><td>113781  </td><td>151.55  </td><td>C22 C26</td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ / Chesterville, ON</td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                            </td><td>...   </td><td>...   </td><td>...    </td><td>...    </td><td>...     </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,304</i></td><td>3       </td><td>False     </td><td>Zabour, Miss. Hileni                           </td><td>female</td><td>14.5  </td><td>1      </td><td>0      </td><td>2665    </td><td>14.4542 </td><td>None   </td><td>C         </td><td>None  </td><td>328.0 </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,305</i></td><td>3       </td><td>False     </td><td>Zabour, Miss. Thamine                          </td><td>female</td><td>nan   </td><td>1      </td><td>0      </td><td>2665    </td><td>14.4542 </td><td>None   </td><td>C         </td><td>None  </td><td>nan   </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,306</i></td><td>3       </td><td>False     </td><td>Zakarian, Mr. Mapriededer                      </td><td>male  </td><td>26.5  </td><td>0      </td><td>0      </td><td>2656    </td><td>7.225   </td><td>None   </td><td>C         </td><td>None  </td><td>304.0 </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,307</i></td><td>3       </td><td>False     </td><td>Zakarian, Mr. Ortin                            </td><td>male  </td><td>27.0  </td><td>0      </td><td>0      </td><td>2670    </td><td>7.225   </td><td>None   </td><td>C         </td><td>None  </td><td>nan   </td><td>None                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,308</i></td><td>3       </td><td>False     </td><td>Zimmerman, Mr. Leo                             </td><td>male  </td><td>29.0  </td><td>0      </td><td>0      </td><td>315082  </td><td>7.875   </td><td>None   </td><td>S         </td><td>None  </td><td>nan   </td><td>None                           </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Load the titanic dataset\n",
    "df = vaex.ml.datasets.load_titanic()\n",
    "\n",
    "# See the description\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shuffling\n",
    "From the preview of the DataFrame we notice that the data is sorted alphabetically by name and by passenger class.\n",
    "Thus we need to shuffle it before we split it into train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:36.318506Z",
     "start_time": "2020-01-14T15:31:36.310129Z"
    }
   },
   "outputs": [],
   "source": [
    "# The dataset is ordered, so let's shuffle it\n",
    "df = df.sample(frac=1, random_state=31)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Shuffling for large datasets\n",
    "As mentioned in [The ML introduction tutorial](tutorial_ml_intro.ipynb), shuffling large datasets in-memory is not a good idea. In case you work with a large dataset, consider shuffling while exporting:\n",
    "\n",
    "```\n",
    "df.export(\"shuffled\", shuffle=True)\n",
    "df = vaex.open(\"shuffled.hdf5)\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Split into train and test\n",
    "Once the data is shuffled, let's split it into train and test sets. The test set will comprise 20% of the data. Note that we do not shuffle the data for you, since vaex cannot assume your data fits into memory, you are responsible for either writing it in shuffled order on disk, or shuffle it in memory (the previous step)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:36.346328Z",
     "start_time": "2020-01-14T15:31:36.320295Z"
    }
   },
   "outputs": [],
   "source": [
    "# Train and test split, no shuffling occurs\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2, verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sanity checks\n",
    "\n",
    "Before we move on to process the data, let's verify that our train and test sets are \"similar\" enough. We will not be very rigorous here, but just look at basic statistics of some of the key features.\n",
    "\n",
    "For starters, let's check that the fraction of survivals is similar between the train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:36.960429Z",
     "start_time": "2020-01-14T15:31:36.348065Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3daZhlVXm38ftPNwGlaRBoUJBBnFAUFNqoMaIGhzgjmAQiCiYIiQE1oISoICpGQXGeaAcQFV4gglFBo0RQTBRtB4ZWUEAQELSZGrqZ4Xk/7F1yOFRV76quqlPVdf+u61ycvdYenlNUn1XP3mtIVSFJkiRJGrs1Bh2AJEmSJM1UJlSSJEmSNE4mVJIkSZI0TiZUkiRJkjROJlSSJEmSNE4mVJIkSZI0TiZUmjGSzEmyPMkWg45lOkjy2SRv7bDf1kmWj/Maj0oyqWsrdP0ckqTBSPLdJH83Cef9f0nePtHn7Tn/Y5LcNFnnl4aYUGnStMnP0OveJLf1bL9qrOerqnuqal5V/W4y4h1Okn2SnD1V1xuLqtqnqv6jw36XVdW8qYhpZYb7eXb9HB3Pv0OSnyW5NclPkmy3kv2fn+TnSVYkuTLJruM9l6SZY6Lbp57z/ijJnhMZa3vef0py5kSft6uq+quqOmlQ1+8qybVJ/nJou6p+XVXrT9C510jyoSQ3JrkuyRGj7PuPfb9jtyapJNuO9VyaGUyoNGna5Gde+8f874CX9pR9uX//JHOnPsrpqf2ynVH/Pgf9/y/JWsB/AccCDwFOBL6aZM0R9n8i8EXgEGA94MnAL8ZzLkkzy1jbp9lqOrVFg25jgAOA5wGPB3YA/i7J3sPtWFWf6/sdOxD4VVUtGeu5NDNMi38kmp2SHJHkpCQnJrkF2DPJ09s7fDcluSbJR4f+iE0yt73Ds1W7/aW2/ptJbknywySPGOFaD05yQpLr23P/OMlGbd36SY5tr3dVkne1jcgTgY8Dz2zvMF3X8XP9Y5LL25guS7J7z+c9rme/+3WnS/KDJO9O8kNgBfDWJD/qO/dbkpza8/kPb9//Jslf9+z3Z0luSLLdMNcZ9vO2dXPau2bXJ7kU+GtG0R7/liQXALe2ZW9vP/ctSZYkeVlbPuzPs/dztNv/lOSSNoavJnlYl587sDNQVfWxqroD+BCwFvCsEfY/FPhkVf13Vd1dVddV1WXjPJek1Uj7XXho+112XZIvJ1m/rVsnTVe1G9r25NwkD0lyNPAU4LPtd9zRw5x32GPbug2SHJ/mKcuVSd7RtkVPBj4MPLs977UdP8Pr+tqiv2nL35fksz37bZPk7p7tH7Xtwrk03+ubtmV7tvEvT/Konv03S/OEb+hzvCLJ+e3nOyfJ43v2/fMk57UxfQn4s1Hi/6c0XQ0/keRG4JA21rPbn9/SJF9Ism67/ynAxsC32xjfMMxn2yLJGe3xv06yV5efZWsv4KiquqbtKfNhYO8xHPuFCTqXpiETKg3aK4ATaJ4QnATcDbwR2Ah4Bs0f9PuNcvzf0/xhvAHNXcZ3j7Dfa4EHAw8HNgReD9ze1n0JuA14JLAQeDHw2qq6ANgfOKe9yzSUgL06yc+Gu0iS+cAHgedV1brtZzh/9B/B/bwa+AdgPs0X7BOSbN33eU8Y5rgTgT16tl8I/L6qhrv2sJ+3rftn4PnA9sCfA3/bIebd2+ut127/muZzrwe8BzghySYj/Tx7JXk+8C7glcBmwO+BL/fUfzPJm0eIY1vgvKGNqirggrZ8OE8D1khyYZtcHj/0B8E4ziVp9fIWmu/Cv6RpN+6iubECsA8wl+Y7aiOa77U7q+og4CfAPu133EHDnHfYY9u6LwPLgK1pvn93AV5dVT8H3gSc3Z73oQBJXpvkx8MF336XvR/YuW2LnglcOIbPvyfwGmBd4E8JXFWtAL7G/dub3YH/rqobkzwN+CRNm7IhTS+Ar6a5Ibo28FXgGJo2+5vAy1YSx040PQc2AoYS1HcBDwWeCDwWeFsb298AfwSe3/6cPjrM+U4BLgYeRtOefijJMwCS7LySZPXx9LQL7fuVtglJHkOTaH9pVc+l6cuESoP2g6r6elXdW1W3VdVPqurc9onBZcAiRn8q8J9Vtbiq7qJpjJ40wn530XwhP6odi7W4qpYn2YzmacS/VtWtVXUtTSKz+0gXrKovVtUOo8RUNInQ2u3dp1+Osm+/z1fVr6rqrqpaDnxjKJYk29A0tN8Y5rgTgF3aBgtGSLw6fN6/BT5UVVdV1fXA+zrE/JF2/9sAqurk9nPfW1UnAJfTJG5dvAr4bFX9oqpup+mO96wkD2/P/cKq+sAIx86j+WOk1zKaPwiGsxnNHw27AI/hviR2POeStHrZDzikqn7ffhe9k6ZbVmjakwXAI9u26idtotHFsMcm2ZImeTiw/W6+Bvgoo7dFx1bVn6/kekNt0dVV9auOMULzPXxx2xbd3Vd3AvdPqHrbm/2Aj1fVT9u2dhHN0/0d2893R1V9sj3vl1n5DcfLquoz7bluq6qLquq7VXVnT/vVqedAkkfT3Cx8a1XdUVWLaZ4avRqgqv5nKFkd5tg128/R2y50bRP2As6sqqsn4FyapkyoNGhX9m60j+dPb7s83ExzJ+oBTzJ69N5NupXmD+HhHAecCZyc5Oq2y8NcYEuaL7Y/tN0TbgI+AWwyng9TVTfTNDT/Alyb5Bvt3amuruzb7m24XgWc2jbu/de9CLgUeHGSecBLGP5J1so+76Z9MVwx1piT7N126Rg6/zaM/v+w16a912x/njfSJD8rs5wmKeo1H7hlhP1vp0lgL6mqW4D3Ai8a57kkrSbapGlz4Iye77Gf0/zNtCHwOeB7wH+m6fb8H0nmdDz9SMduCawNLO255kcYf1t0I02b8Qaatuhrvd30Ouhvi3r9N7BJku3b9u3RwNfbui1puqvf1PM5FtB8h28KXNV3rpW1Mf3ty6ZJTmnb8ZuBzzK29mXp0M2/nuuvtH1pb9rewf3bhZW2Ce3v0p70dPcb77k0vZlQadD6p+Q+hqZbwqOqaj5wGJBVvkhzN+vwqnocTReOV9A0NlfSJGIbVNX67Wt+VQ3N6DbmKcOr6ptV9VyaLgWXtJ8JmnFRD+7Zdbg7Yf3X+xawWZrxR3swfJI0ZKjb3yuAX1TV5cPss7LPew3NHxJDukxR3zs+a2vgUzRdBzesZnali7jv/+HKfp6/p2mQh863Ls2kEFd3iGMJzd3HoWND0yVkyQj7nz9KPGM9l6TVRNvF92rgr3q+J9evqrWrGWt5R1UdVlXb0Dx1+Rvue5I06nfcKMdeSXMj5yF9381DvSHG0xadXlU70yQSv6P5bobxtUW9570L+E+a9uZVwGk9ScqVwGF9P7cHV9WpNO3Lw/tOt7I2pj+O97fxP6H9G2Ef7v83wmg/p98DC5I8qO/6XdoXgF/S0y6071fWJvwVTff3r07AuTSNmVBpulmX5tH3iiSPY/TxU50l+askT0gz+cLNNN0u7qmqK2nuFn4gyfw0A4AflWSn9tA/AA9Px9ndkjwsyUuTPJimX/wK4J62+hc03dc2TzO4+ZCVna+q7gS+QjMuax7w3VF2P5FmLNO+jJB4dfi8JwNvSjPIeEPg31YWY595NA3aUpo8ZB+aJ1RDVvbzPBH4xzSTaaxF89TonKrqv6s5nO8Cc5L8S3vsG2n+P39vhP2Pba+1Vfv/62Du60451nNJWr18Gnhfks0Bkmyc5KXt++cmeXxPe3I3933P/4Gma/awRjq2qn4L/Ag4Ksm67Xfzo3PfFOB/ADYfQ1u0WZIXt99td9Aka71t0XPafR7C2L/noWljdueBN/oWAQckWZjGvCQva+P4PrB2mskm5ibZAxjrchTrtp/l5jRrUh7YVz/az/8SmhtpRyRZK8kONN3xus7qeDzwliQPbX8v3kTT+2U0ewEn9z0VG++5NI2ZUGm6OYjmC+gWmic7E7XuxabAqTQN2BKa7n8ntnV7AuvQ3DG6kWbQ6tAdu+8Av6HpInctQJK9kvQOJu01h2Yw8zXA9cBf0Aw6huZp02k0kxv8mGZgbxcnAM8FTqqqe0baqU06FtNMtnDyKOcb7fN+CvifNsaf0NyF7KyaSTA+SvP5rqFJps7t2eUBP8++479F083ztPb4LWjugAKQ5NtJDh7h2rcDL6e5Y3lT+zlf3t5NHe7/22dofgcW03T7WAH8a5dzSVrtHUXTTnw3zSy0/0czvTU0XcT+i6aduhA4g/u+cz8EvCbN+kJHDXPe0Y7dAxh6qn8DTfs31OXvWzTjUf+Y5Cr404yyPx0h/jnAv9N0i7+eZlKEA9q602luHv2SJonrf3rSxffba6xH83MCoKr+l6ab4TE0352/phljVW1S8QqaSaFupJkQ6euMzWE0vUyW0bQTX+mrfw/wnra74f69Fe2Tx7+lmRDiWpqf71uq6hz4U7I72my+H6VpH39Fk5SeUlXHDVUmuTTJbj3b84Bduf/sfp3OpZknze+XJEmSJGmsfEIlSZIkSeNkQiVJkiRJ42RCJUmSJEnjZEIlSZIkSeNkQiVJkiRJ4zR30AFMpo022qi22mqrQYchSZogP/3pT6+rqgWDjqML2yBJWn2M1v6s1gnVVlttxeLFiwcdhiRpgiS5YtAxdGUbJEmrj9HaH7v8SZIkSdI4mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4mVBJkiRJ0jit1gv7zmRbHXL6oEOYdS5/34sHHYIkDZztz2DYBkkzl0+oJEmrvST7J1mc5I4kx/WUb5WkkizveR3aU58kRya5vn0dlSQD+RCSpGnJJ1SSpNng98ARwAuABw1Tv35V3T1M+b7ALsD2QAHfAS4DPj1JcUqSZhifUEmSVntVdWpVfRW4foyH7gUcXVVXVdXVwNHA3hMdnyRp5jKhkiQJrkhyVZJjk2zUU74tcF7P9nltmSRJgAmVJGl2uw54CrAlsCOwLvDlnvp5wLKe7WXAvJHGUSXZtx2rtXjp0qWTFLIkaToxoZIkzVpVtbyqFlfV3VX1B2B/4PlJ5re7LAfm9xwyH1heVTXC+RZV1cKqWrhgwYLJDV6SNC2YUEmSdJ+hRGnoCdQSmgkphmzflkmSBJhQSZJmgSRzk6wNzAHmJFm7LXtqkscmWSPJhsBHgbOraqib3/HAgUk2S7IpcBBw3EA+hCRpWjKhkiTNBm8HbgMOAfZs378d2Br4FnALcCFwB7BHz3HHAF8HLmjrT2/LJEkCXIdKkjQLVNXhwOEjVJ84ynEFHNy+JEl6AJ9QSZIkSdI4mVBJkiRJ0jhNeUKVZPckv0qyIsmlSZ7Zlu+c5KIktyY5K8mWPcckyZFJrm9fR420BogkSZIkTZUpTaiSPA84EngtzeKJOwGXtavSnwocCmwALAZO6jl0X2AXmulqtwNeAuw3dZFLkiRJ0gNN9ROqdwLvqqofVdW9VXV1VV0N7AosqapTqup2moHD2yfZpj1uL+Doqrqq3f9oYO8pjl2SJEmS7qdTQpVkQZIFPdtPTHJEkj1GO67vHHOAhcCCJJckuSrJx5M8CNgWOG9o36paAVzaltNf377flmEk2TfJ4iSLly5d2jU8SZIkSRqzrk+oTgZeCtB2z/s+8Arg00kO6niOTYA1gVcCzwSeBDyZZh2QecCyvv2X0XQLZJj6ZcC84cZRVdWiqlpYVQsXLFjQXy1JkiRJE6ZrQrUd8KP2/SuBS6pqW+A1dB/LdFv7349V1TVVdR3wQeBFwHJgft/+82kWWmSY+vnA8nZ9EEmSJEkaiK4J1YNokhqA5wJfa9//DNi8ywmq6kbgKmC4JGgJzYQTACRZB3hkW/6A+vb9EiRJkiRpgLomVL8Bdk2yOfB84Ntt+SbATWO43rHAAUk2TvIQ4E3AN4DTgCck2S3J2sBhwPlVdVF73PHAgUk2S7IpcBBw3BiuK0mSJEkTrmtC9U6a6c4vB35UVee25S8Afj6G670b+Anwa+BX7bHvqaqlwG7Ae4AbgacCu/ccdwzwdeAC4ELg9LZMkiRJkgZmbpedqurUJFsAm3L/2fbOBL7S9WJVdRfw+vbVX3cmsM0DDmrqCji4fUmSJEnStLDSJ1RJ1kxyLbBRVf28qu4dqquqc3u65UmSJEnSrLLShKp9qnQXw08mIUmSJEmzVtcxVB8D/j1Jpy6CkiRJkjQbdE2Qngk8C7g6yYXAit7KqnrZRAcmSZIkSdNd14TqOsYw+YQkSZIkzQZdZ/l77WQHIkmSJEkzTdcxVAAkWZjk75Ks026v47gqSZIkSbNVp2QoySbA14Cn0Mz292jgMuCDwO3AGycrQEmSJEmarro+ofoQcC2wIXBrT/kpwPMnOihJkiRJmgm6dtfbGdi5qm5M0lt+KbDFhEclSZIkSTNA1ydUDwLuHKZ8AU2XP0mSpq0k+ydZnOSOJMf1lD8tyXeS3JBkaZJTkjysp/7wJHclWd7z2nogH0KSNC11Tai+D+zds11J5gD/BvzPRAclSdIE+z1wBPD5vvKHAIuArYAtgVuAY/v2Oamq5vW8LpvsYCVJM0fXLn8HA99L8hRgLeBoYFtgPeAZkxSbJEkToqpOhWa2WuDhPeXf7N0vyceB701tdJKkmazTE6qq+iXwROD/gG8Da9NMSPHkqrp08sKTJGlK7QQs6St7adslcEmSfx5EUJKk6avzGlJVdS3wjkmMRZKkgUmyHXAY8PKe4pNpugT+AXgq8JUkN1XViSOcY19gX4AttnDOJkmaDbquQ7XTCFVFMynFpVV1w4RFJUnSFEryKOCbwBur6pyh8raHxpD/S/IR4JXAsAlVVS2iScBYuHBhTV7EkqTpousTqrNpkieAoXnTe7fvTfI14NVVtWLiwpMkaXIl2RI4E3h3VX1xJbsX97WDkiR1nuXvxcCvgD2BR7WvPWn6me/Wvp4EvG8SYpQkaZUkmZtkbWAOMCfJ2m3ZZsB3gU9U1aeHOe7lSR6Sxp8DbwD+a2qjlyRNZ12fUB1B0w2id4r0y5IsBY6sqh2T3AN8DDhgooOUJGkVvZ37jwPeE3gnzROnrYF3JPlTfVXNa9/uTjPV+lrAVTRt3hemJGJJ0ozQNaF6PHD1MOVXt3UAFwAPnYigJEmaSFV1OHD4CNXvHOW4PSYjHknS6qNrl79fAm9LstZQQfv+rW0dwObAtaOdJMnZSW7vWW3+4p66nZNclOTWJGe1fdqH6pLkyCTXt6+jktiHXZIkSdJAdU2oXg+8ALi6TYrOonk69QJgaE2OrYFPdjjX/j2rzT8WIMlGwKnAocAGwGLgpJ5j9gV2AbYHtgNeAuzXMXZJkiRJmhSduvxV1blJHkHT5/yxNDMcnQh8eWhWv6o6fhXi2BVYUlWnACQ5HLguyTZVdRGwF3B0VV3V1h8NvA54wABiSZIkSZoqY1nYdwVwzARc871J3gdcDLytqs4GtgXO671Wkkvb8ov669v32w53chdVlCRJkjRVOidUSTYHnglsTF9Xwar6YMfT/BvNmKs7aWZO+nqSJwHzgKV9+y4D1m3fz2u3e+vmJUlV3W/hRBdVlCRJkjRVOiVUSV5FM23s3TSJT2+iUkCnhKqqzu3Z/EKSPYAXAcuB+X27zwduad/3188HlvcnU5IkSZI0lbpOSvEu4GhgflVtVVWP6HltvQrXH1pxfgnNhBMAJFkHeGRbTn99+34JkiRJkjRAXROqTYDPVtU9471QkvWTvKBndfpXATsB/w2cBjwhyW7tSvaHAee3E1IAHA8cmGSzJJsCBwHHjTcWSZIkSZoIXcdQnQE8FbhsFa61JnAEsA1wD81kE7tU1cUASXYDPg58CTiXZozVkGNopmW/oN3+LBMzQYYkSZIkjVvXhOo7wJFJtqVJau7qrayqU1d2gqpaCjxllPozaZKt4eoKOLh9SZIkSdK00DWhGnoa9NZh6gqYMzHhSJIkSdLM0XVh365jrSRJkiRp1jBRkiRJkqRx6pRQpfH6JEuS3Jpk67b8kCR/O7khSpIkSdL01PUJ1RuBtwOLaNaNGnI1sP9EByVJkiRJM0HXhOqfgNdV1UeAu3vKfwZsO+FRSZIkSdIM0HWWvy2BC4cpvwt40MSFI0mSJE2+rQ45fdAhzEqXv+/Fgw5hwnV9QnUZsMMw5S8Cfjlx4UiSJEnSzNH1CdUHgI8neTDNGKqnJ3k1zUK7/zBZwUmSJEnSdNZ1Hapjk8wF/gN4MPBFmgkp3lBVJ01ifJIkSZI0bXV9QkVVfQb4TJKNgDWq6o+TF5YkSZIkTX9d16FaI8kaAFV1HbBGkn2S/MWkRidJkiRJ01jXSSlOBw4ASDIPWAy8H/hektdMUmySJE2IJPsnWZzkjiTH9dXtnOSiduH6s5Js2VOXJEcmub59HZUkD7iAJGnW6ppQ7Qh8t32/K3AzsDHwOuDNkxCXJEkT6ffAEcDnewvbbuynAocCG9DcMOwdG7wvsAuwPbAd8BJgvymIV5I0Q3RNqNYFbmrfPx84raruokmyHjkZgUmSNFGq6tSq+ipwfV/VrsCSqjqlqm4HDge2T7JNW78XcHRVXVVVVwNHA3tPUdiSpBmga0L1O+AZSdYBXgB8py3fALh1MgKTJGkKbAucN7RRVSuAS9vyB9S377dFkqRW14TqgzRTpV9FM13699vynYALJiEuSZKmwjxgWV/ZMpqeGcPVLwPmjTSOKsm+7VitxUuXLp3wYCVJ00+nhKqqjgGeTrOI719W1b1t1aU0/c4lSZqJlgPz+8rmA7eMUD8fWF5VNdzJqmpRVS2sqoULFiyY8GAlSdPPWNahWkwzWBeAJGtW1emTEpUkSVNjCc04KQDaru2PbMuH6rcHftxub99TJ0lS53Wo3pBkt57tzwG3Jbk4yWMnLTpJkiZAkrlJ1gbmAHOSrJ1kLnAa8IQku7X1hwHnV9VF7aHHAwcm2SzJpsBBwHED+AiSpGmq6xiqNwBLAZLsBPwt8PfAL2hmPBqTJI9OcnuSL/WUuQ6IJGmyvB24DTgE2LN9//aqWgrsBrwHuBF4KrB7z3HHAF+nGS98Ic26jMdMXdiSpOmua5e/zYDL2/cvBU6pqpOTXACcM47rfgL4ydBGzzog+9A0XO+mWQfkae0uveuAFM0sg5cBnx7HtSVJs0xVHU4zJfpwdWcC24xQV8DB7UuSpAfo+oTqZmBodO3zgP9p398FrD2WCybZnWZNq//pKXYdEEmSJEkzTteE6tvAZ9qxU48CvtmWbwv8tuvFkswH3kXTB72X64BIkiRJmnG6JlT/AvwvsBHwyqq6oS3fAThxDNd7N/C5qrqyr3zC1gFxDRBJkiRJU6XTGKqquhk4YJjyd3S9UJInAc8FnjxM9YStA1JVi4BFAAsXLhx2nRBJkiRJmgid16EakuShwJ/1llXV7zoc+mxgK+B37YOleTRT1z6eZnIJ1wGRZqGtDnE5u6l2+ftePOgQJElabXRKqJKsB3yUZrr0PxtmlzkdTrMI+H8922+mSbD+ud1+f7vW1emMvA7IGTSz/B0EfKxL7JIkSZI0WbqOofoAzVOhXYDbadagegtwFfB3XU5QVbdW1bVDL5pufLdX1VLXAZEkSZI0E3Xt8vdCYI+qOifJPcBPq+qkJNcA+wH/OdYLt2uC9G67DogkSZKkGaXrE6r1gSva98uADdv3PwT+YqKDkiRJkqSZoGtCdSmwdfv+V8Du7ZTluwI3jHiUJEmSJK3GuiZUxwHbte/fR9PN707g/cCREx+WJEmSJE1/Xdeh+lDP++8meRywI/CbqrpgsoKTJEmSpOlszOtQAVTVFdw3pkqSJEmSZqWuXf5IskuS7ye5rn2dk+QVkxmcJEmSJE1nnRKqJAcBJwEXc9/05RcBJyR58+SFJ0mSJEnTV9cuf28G9q+qz/SUfT7Jj4F30Sz8K0mSJEmzStcuf/OAs4YpP6utkyRJkqRZp2tC9VXglcOU7wZ8beLCkSRJkqSZo2uXv0uAQ5I8B/hhW/a09vXBJAcO7VhVH5zYECVJkiRpeuqaUO0N3Ag8pn0NuRF4bc92ASZUkiRJkmaFrgv7PmKyA5EkSZKkmabzOlSSJEmSpPszoZIkSZKkcTKhkiTNakmW973uSfKxtm6rJNVXf+igY5YkTR9dJ6WQJGm1VFV/Wk8xyTrAH4BT+nZbv6runtLAJEkzwohPqJJ8Psm67fudkph8SZJWd68E/gicM+hAJEkzw2hd/vYE1mnfnwVsMPnhSJI0UHsBx1dV9ZVfkeSqJMcm2Wikg5Psm2RxksVLly6d3EglSdPCaE+dLgcOSPJtIMDTk9w43I5V9f1JiE2SpCmTZAvgWcA/9hRfBzwF+AWwIfAJ4MvAC4Y7R1UtAhYBLFy4sD8pkySthkZ7QvUW4HU0T6cKOA04e5jXWV0vluRLSa5JcnOSXyfZp6du5yQXJbk1yVlJtuypS5Ijk1zfvo5Kkq7XlSSpg9cAP6iq3w4VVNXyqlpcVXdX1R+A/YHnJ5k/sCglSdPKiAlVVf1XVW1M09UvwLbAgmFeG4/heu8Ftqqq+cDLgCOS7Nh2nzgVOLS93mLgpJ7j9gV2AbYHtgNeAuw3hutKkrQyrwG+sJJ9hp46eVNPkgR0mOWvqm5K8hzgN6s6w1FVLendbF+PBHYEllTVKQBJDgeuS7JNVV1E06f96Kq6qq0/mubp2adXJR5JkgCS/AWwGX2z+yV5KnAT8BvgIcBHgbOratmUBylJmpY6rUNVVd8D5iT5hyQfSPL+JK9NstZYL5jkk0luBS4CrgHOoHn6dV7P9VYAl7bl9Ne377dFkqSJsRdwalXd0le+NfAt4BbgQuAOYI8pjk2SNI11mgo9yeOBbwLrARe0xa8DDk/y11X1q64XrKrXJzkAeDrwbJrGaR7QPx3SMmDd9v28dru3bl6S9M/ElGRfmi6CbLHFFl3DkiTNYlU1bDfyqjoROHGKw5EkzSCdnlABH6GZ4WiLqnpmVT0T2ILmSdGHx3rRqrqnqn4APBz4Z2A50D/Adz7NHUGGqZ8PLB9mWluqalFVLayqhQsWLBhraJIkSZLUWdeE6hnAW6vq5qGC9v3bgL9chevPpRlDtYRmwrIlXFEAABDaSURBVAngTyvVD5XTX9++7x2PJUmSJElTrmtCdTuw/jDl67V1K5Vk4yS7J5mXZE6SF9D0Q/8uzZTsT0iyW5K1gcOA89sJKQCOBw5MslmSTYGDgOM6xi5JkiRJk6JrQvV14DNJntEmQ3OS/CVwDPC1jucomu59VwE3Ah8A3tROz74U2A14T1v3VGD3nmOPaWO4gGZQ8OltmSRJkiQNTKdJKYA30qzNcQ5wT1u2Bk0y9aYuJ2iTpmeNUn8msM0IdQUc3L4kSZIkaVrolFBV1U3Ay5M8CngczYKGv6yqSyYzOEmSJEmazro+oQKgTaBMoiRJkiSJ7mOoJEmSJEl9TKgkSZIkaZxMqCRJkiRpnFaaUCWZm+T17fpPkiRJkqTWShOqqrobeD+w5uSHI0mSJEkzR9cufz8CdpjMQCRJkiRppuk6bfpngKOTbAn8FFjRW1lVP5vowCRJkiRpuuuaUJ3Q/veDw9QVMGdiwpEkSZKkmaNrQvWISY1CkiRJkmagTglVVV0x2YFIkiRJ0kzTeR2qJC9M8o0kv0yyeVu2T5KdJy88SZIkSZq+OiVUSV4FnAz8hqb739AU6nOAgycnNEmSJEma3ro+oToYeF1V/Stwd0/5j4AnTXhUkiRJkjQDdE2oHg38cJjy5cD8iQtHkiRJkmaOrgnV74HHDFO+E3DpxIUjSZIkSTNH14RqEfDRJM9otzdPshdwFPCpSYlMkqQpkuTsJLcnWd6+Lu6p2znJRUluTXJWu8i9JElAx4Sqqo4CTgW+A6wDnAV8Gvh0VX1i8sKTJGnK7F9V89rXYwGSbETT/h0KbAAsBk4aYIySpGmm68K+VNXbkrwHeDxNIvbLqlo+aZFJkjR4uwJLquoUgCSHA9cl2aaqLhpoZJKkaaHzOlStAm4HbgXumfhwJEkamPcmuS7J/yZ5dlu2LXDe0A5VtYJm7PC2w50gyb5JFidZvHTp0kkPWJI0eF3XoVoryYeBG2galvOBG5J8JMnaYzjH55JckeSWJD9P8sKe+hH7qKdxZJLr29dRSTK2jypJ0oj+Ddga2Ixm3PDXkzwSmAcs69t3GbDucCepqkVVtbCqFi5YsGAy45UkTRNdn1B9CnglsA/NFOqPat+/Avhkx3PMBa4EngWsR9Mf/eQkW3Xoo74vsAuwPbAd8BJgv47XlSRpVFV1blXdUlV3VNUXgP8FXsTwy4PMB26Z6hglSdNT1zFUfwPsWlXf6Sm7LMkfga8A/7CyE7TdJA7vKfpGkt8COwIbMnof9b2Ao6vqqrb+aOB1NBNjSJI00QoIsISmDQIgyTrAI9tySZI6P6FaAVw9TPnVwG3juXCSTWjWtlrCyvuo36++fW//dUnSKkuyfpIXJFk7ydwkr6JZZ/G/gdOAJyTZre3ifhhwvhNSSJKGdE2oPga8I8mDhgra94e2dWOSZE3gy8AX2kZpZX3U++uXAfOGG0dl/3VJ0hitCRwBLAWuAw4Adqmqi6tqKbAb8B7gRuCpwO6DClSSNP2M2OUvydf6ip4NXJ3k/Hb7ie3x64zlgknWAL4I3Ans3xavrI96f/18YHlV1ViuLUlSvzZpesoo9WcC20xdRJKkmWS0MVTX921/pW/7t2O9WPtE6XPAJsCLququtmplfdSX0ExI8eN2e3vsvy5JkiRpwEZMqKrqtZNwvU8BjwOeW1W9Y69OA96fZDfgdB7YR/144MAkZ9AMFD6IcXQ1lCRJkqSJNNaFfcetXVdqP+BJwLVJlrevV3Xoo34M8HXgAuBCmqTrmKmKXZIkSZKG02na9CQPoZny/DnAxvQlYlW18crOUVVX0ExBO1L9iH3U27FSB7cvSZIkSZoWuq5DdTzNNOVfAP5A0+1OkiRJkma1rgnVs4FnVdXPJjEWSZIkSZpRuo6hunQM+0qSJEnSrNA1SXoj8N4k2yeZM5kBSZIkSdJM0bXL3yXAg4CfATTLSd2nqkyyJEmSJM06XROqE4H1gDfgpBSSJEmSBHRPqBYCf15VF05mMJIkSZI0k3QdQ/VLYP5kBiJJkiRJM03XhOrtwAeTPDfJJkk26H1NZoCSJEmSNF117fJ3Rvvfb3P/8VNpt52UQpIkSdKs0zWhes6kRiFJkiRJM1CnhKqqvjfZgUiSJEnSTNMpoUqyw2j1VfWziQlHkiRJkmaOrl3+FtOMlepd0bd3LJVjqCRJkiTNOl0Tqkf0ba8JPBl4G/DvExqRJEmSJM0QXcdQXTFM8SVJlgHvAL45oVFJkiRJ0gzQdR2qkfwWeNJEBCJJkiRJM03XSSn6F+8N8DDgcODiCY5JkiRJkmaErk+orgOW9rz+CJwPPAV4/eSEJknS5EuyVpLPJbkiyS1Jfp7khW3dVkkqyfKe16GDjlmSNH2Md2Hfe2kSq0uq6u6JDUmSpCk1F7gSeBbwO+BFwMlJntizz/q2d5Kk4XR6QlVV3+t7nVNVF421cUmyf5LFSe5Iclxf3c5JLkpya5KzkmzZU5ckRya5vn0dlSQPuIAkSWNUVSuq6vCquryq7q2qb9CMEd5x0LFJkqa/UROqJBt0eY3her8HjgA+33edjYBTgUOBDWjWvTqpZ5d9gV2A7YHtgJcA+43hupIkdZJkE+AxwJKe4iuSXJXk2LbNGunYfdsbh4uXLl066bFKkgZvZU+o+sdODff6Y9eLVdWpVfVV4Pq+ql2BJVV1SlXdTjPZxfZJtmnr9wKOrqqrqupq4Ghg767XlSSpiyRrAl8GvlBVF9G0g08BtqR5YrVuWz+sqlpUVQurauGCBQumImRJ0oCtbAxV/9ipXn8NvBGYiD7l2wLnDW1U1Yokl7blF/XXt++3He5ESfaleaLFFltsMQGhSZJmgyRrAF8E7gT2B6iq5TS9JgD+kGR/4Jok86vq5sFEKkmaTkZNqKrqe/1lSXYAjgR2Ao4B3j0BccyjedrVaxnNncCh+mV9dfOSpKqqL+ZFwCKAhQsX3q9OkqThtONyPwdsAryoqu4aYdehdsVxvJIkYAwL+yZ5RJITgHOBG4DHV9UbqmoiOokvB+b3lc0Hbhmhfj6wvD+ZkiRpnD4FPA54aVXdNlSY5KlJHptkjSQbAh8Fzq6qZSOdSJI0u6w0oUqyYZKP0HS9eyjw9Kr6u6q6dALjWEIz4cTQNdcBHsl9A4LvV9++7x0sLEnSuLSzyu4HPAm4tme9qVcBWwPfornBdyFwB7DHwIKVJE07o3b5S/JW4GDgcuDlVfWtVblYkrntNecAc5KsTTMG6zTg/Ul2A04HDgPObwcEAxwPHJjkDJruFgcBH1uVWCRJAqiqKxi9C9+JUxWLJGnmWdmkFEcAtwFXAa9P8vrhdqqql3W83tuBd/Rs7wm8s6oOb5OpjwNfoulWuHvPfsfQ3CW8oN3+bFsmSZIkSQOzsoTqeO4bgLvKqupwminRh6s7E9hmhLqieVJ28ETFIkmSJEmramWz/O09RXFIkiRJ0ozTeZY/SZIkSdL9mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4mVBJkiRJ0jiZUEmSJEnSOJlQSZIkSdI4zZiEKskGSU5LsiLJFUn+ftAxSZJmB9sgSdJI5g46gDH4BHAnsAnwJOD0JOdV1ZLBhiVJmgVsgyRJw5oRT6iSrAPsBhxaVcur6gfA14BXDzYySdLqzjZIkjSamfKE6jHAPVX1656y84Bn9e+YZF9g33ZzeZKLpyA+3Wcj4LpBBzEeOXLQEWiG8Xd9MLYcwDVtg2YO/11qtvB3feqN2P7MlIRqHrCsr2wZsG7/jlW1CFg0FUHpgZIsrqqFg45Dmmz+rs8qtkEzhP8uNVv4uz69zIguf8ByYH5f2XzglgHEIkmaXWyDJEkjmikJ1a+BuUke3VO2PeBgYEnSZLMNkiSNaEYkVFW1AjgVeFeSdZI8A3g58MXBRqZh2NVFs4W/67OEbdCM4r9LzRb+rk8jqapBx9BJkg2AzwPPA64HDqmqEwYblSRpNrANkiSNZMYkVJIkSZI03cyILn+SJEmSNB2ZUEmSJEnSOJlQSVJHSZ6T5JgkX223d0jygMVdJUmaSLY/05sJlSR1kOT1wOeAK4HntMV3Au8ZWFCSpNWe7c/056QUmhBJngPsDmxSVbsk2QFYt6q+N+DQpAmR5FLgeVV1WZIbq+ohSeYAf6yqDQcdnzSb2QZpdWb7M/35hEqrzDsnmiXWBa5o3w/diZpL87suaUBsgzQL2P5McyZUmggHAc+tqiOAe9uyXwGPG1xI0oT7AfDmvrJ/AbwDLg2WbZBWd7Y/05xd/rTKkvwReFhV3ZPkhqraIMlawOVV9bBBxydNhCSbAd+guVO4JfBrmruDL6qqawYZmzSb2QZpdWf7M/3NHXQAWi0M3Tk5sqfMOydarVTV1Ul2BJ4ObEHTveiHVXXPYCOTZj3bIK3WbH+mP59QaZV550SSNCi2QZIGzYRKEyLJGsDTaBoz75xotZPkt9w3GPh+qmrrKQ5HUg/bIK3ObH+mPxMqTbgkzwTuqar/G3Qs0kRJsnNf0cOAA4ATq+rDAwhJ0jBsg7S6sf2Z/kyotMqSnA0cWlXnJHkz8G/A3cCHq+rIUQ+WZrAkDwPOqKonDzoWabayDdJsZPszvZhQaZUluZ5mMcW7k/wG2AW4BTinqrYcbHTS5EmyPnBFVa036Fik2co2SLOR7c/04ix/mghrAPcm2RqYW1VLAJJsMNiwpImT5LC+ogcDLwa+PYBwJN3HNkirNduf6c+EShPh/4APA5sCpwG0Ddv1gwxKmmCP7tteAXwCOG7qQ5HUwzZIqzvbn2nOhEoTYW/gLcDFwHvbsscDHxtUQNJESjIH+A5wclXdPuh4JN3P3tgGaTVl+zMzOIZKkjpIssy+6pKkqWb7M/35hErjMkx/3mFV1bsmOxZpipye5EVVdcagA5FmO9sgzTK2P9OcCZXGq78/r7S6WwM4NckPaBYO/dPj/ar6h4FFJc1OtkGaTWx/pjkTKo1LVb160DFIU+w3wPsHHYQk2yDNOrY/05xjqDRhkjwI2AjIUFlV/W5wEUmrLskeVXXioOOQNDrbIK1ubH9mDhMqrbIk2wBfBHakeQyd9r9U1ZwBhiatsiQ3V9X8QcchaXi2QVpd2f7MHGsMOgCtFj4J/BDYGLgZWAB8lmYqW2mmy8p3kTRAtkFaXdn+zBA+odIqS3ID8NCqujPJTVW1fpJ1gAuqautBxyetiiS30qxIP2LDVlXfnbqIJPWyDdLqyvZn5nBSCk2EO2h+l+4Erk+yOXAjTV92aaZbC/gcIzdoBfhHmzQ4tkFaXdn+zBAmVJoIPwBeCRwPfAU4naaBO3uAMUkTZYV3uaVpzTZIqyvbnxnCLn8atyQPraprkwSgqirJGsBrgHWBY6tq+UCDlFaRg4Kl6ck2SKs725+Zw4RK49b/Dz3JqVW16yBjkiZakluqat1BxyHp/myDtLqz/Zk5TKg0bv3/0JPcUFUbDDImSdLsYBskabpw2nStCrNxSdKg2AZJmhaclEKrYm6S53Df7DP9207nKUmaLLZBkqYFu/xp3JJczuh3CMvZaSRJk8E2SNJ0YUIlSZIkSePkGCpJkiRJGicTKkmSJEkaJxMqSZIkSRonEypJkiRJGicTKkmSJEkap/8PTe7ntjg2qeMAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Inspect the target variable\n",
    "train_survived_value_counts = df_train.survived.value_counts()\n",
    "test_survived_value_counts = df_test.survived.value_counts()\n",
    "\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "train_survived_value_counts.plot.bar()\n",
    "train_sex_ratio = train_survived_value_counts[True]/train_survived_value_counts[False]\n",
    "plt.title(f'Train set: survivied ratio: {train_sex_ratio:.2f}')\n",
    "plt.ylabel('Number of passengers')\n",
    "\n",
    "plt.subplot(122)\n",
    "test_survived_value_counts.plot.bar()\n",
    "test_sex_ratio = test_survived_value_counts[True]/test_survived_value_counts[False]\n",
    "plt.title(f'Test set: surived ratio: {test_sex_ratio:.2f}')\n",
    "\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next up, let's check whether the ratio of male to female passengers is not too dissimilar between the two sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.156469Z",
     "start_time": "2020-01-14T15:31:36.961543Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3deZglVX3/8feHGQLIMLKNqMjqhqLiMipuoOKSmBiJmIhCRBOESHD5gUFUQFxQQdGoqIAboKiAQhRRIyguaMCMGsRBUECQ3RmWgWET8Pv7o6rhcumluqd7+vb0+/U89Uzdc+pWfe+d7vr2qXPqVKoKSZIkSdL4rTbdAUiSJEnSTGWDSpIkSZImyAaVJEmSJE2QDSpJkiRJmiAbVJIkSZI0QTaoJEmSJGmCbFBpSiWZk2R5kk2nO5aVJcn7khwzjcdPkuOS3JjkZyv52F9KcvDKPGbf8b+XZJfpOr4kTYckH0zy2Wk8/mpJjm/zzo9X8rG/muSAlXnMvuP/IMkrp+v4Ggw2qHQfbeNnaPlLktt6Xo/7D9Wquruq5lXVH6ci3uEk2T3JD1fW8QbQc4HtgYdW1TOnOZYpM1zDtapeVFXHT8K+10ry9SSXJakkzx5j+y2TfDfJDUmuSfKxJHMmsi9Jk5+LevZ7dpJdJzPWdr//luSMyd7vDLID8AzgIVW13XQHM1WGa7hW1fOr6oRJ2Pfafbli21G2XaPvd2To9+RDPdvsleSSJDcn+VaSjVY0Ro3MBpXuo238zKuqecAfgZf2lN3vD9Ukc1d+lBrDZsAfqurW6Q5kogbg56qAHwOvBpZ02P5I4ErgwcCTgBcAe05wX9KsN95cpGm3GXBJVd023YFM1IDknR/R5IobRt2w6o6+35GHAXcAJwEkeTFwAPA3wIbAtcBxUxj7rGeDSuPS9gqckOQrSW4Gdk3yjPaq341Jrk7y8SSrt9vPba+0bN6+/lJb/532qsn/JNlihGM9IMmXk1zX7vvnSTZs69ZN8oX2eFckeU875ODxwBHAc9orNks7fq6z2n2cneSWJP+VZIP2c96U5Jz0DFtMckR73JuS/G+SEXuCkjyr5/v5vyTDXr1LckCSr/aVfTLJR9r1f01yafu9XZJk52H2sQfNH/dDn//Atvzvk5zbxnBWksf1vOeKJG9N8pv2PUcn2SjJf7ef73tJ1m23XS3J19pemBuT/DDJY0b57CMet2+7oZ+TvZJcBFww2vec5O+A/YBd2ph/0ZafleS1PbEe1F7t+1OSY5LMHynWXlV1e1V9rKp+Cvylw1u2AE5ok9zVwPeArSe4L0ljSDOc/MD2XLg0zXCzofPU2mmGgV3fnnvOSbJeksOBpwKfbc8bhw+z32Hf29atn2Y49TVJLk/yrvY88yTgP4Hntvu9puNnODvJwW1uW57k5DbvnNie885O8rCe7T/dcz78eUbvwXhOG/uNSX6Z5FkjbHdwki/1lR2V5LB2/fV9eecfh9nHXjR5d+jzv6Mt/4ckv25j+EmSx/a855ok+yRZ3L7n00kekuT09vN9d+h83eaHrye5tt3XmUkePcpnH/G4fdut2eadNyS5GPjNaN9zkh2BfYDd2ph/3pbf0+vZ/ly+O8kf23g/n2SdkWLtVVW3VtXHJ5grXglcWlU/b1+/FPhKVV1YVXcAhwAvSrLxOPerrqrKxWXYBbgUeEFf2fuAP9P8sq4GrEWToJ4OzAW2BH4H7N1uP5fmqsvm7esvAUuBhcDqwAnAl0Y4/r8D/9UeY077nnlt3beATwEPoOkV+AXwr23d7sAP+/b1z8AvR/msZ7VxbwmsR/MH/YXA89rP8GXgM337W7+textN78QaPd/RMe36JsB1wIvb7+uv28+/wTAxbAksB9bu+e7+1H7u+cAy4JFt3UOAx47wWe7z+dv/n2vbf+cA/wJcDPxVW38F8DPgQTRXua4DFgHbAGvSXDF7Z7vtasBrgXXauiOART3H+hJwcJfj9sU89HPy3fb7X2s833Pf/+Nr2/U92v/TLdp4vwF8oWfbxcA/dfg9uAZ49hjb/DvwBZqf1YcBv6W5oj7ufbm4uNx3YfhctD/wE+Ch7bnomKHfb+DNwNfa38e57Tlo6Lx6NrDrKMca7b3fAT5Bk3ceAvwK2K2t+zfgjL59vQ74+SjHOrs9V2zenud+T5N7tm+PfQLw6Z7tX9OeH1cH3glcDqze1n0Q+Gy7vjnNefwFNOfsl9D0jq83TAyPAm7uOeeu3r73ie2xbgQe3tZtDDxmhM9yn88PbAtcDTyF5vw/dD6e29Zf0/7/bQhsStMj83Pg8e13fxbwtnbbucBuwLz2//rTwNk9x/oqcECX4/bFvCZN3jkNWLfnO+j0Pff9P+7aru/V/p9uRpO3v8V9/3a4EHh5h5/5pcC24/gd+Rmwf8/rTwIf6Xn98Pazvni6f59X1cUeKk3EWVV1alX9papuq6r/rapzququqroEOJomIYzka1W1qKruBI6nOXEP506ak+0jqrkXa1FVLW+vsOwA/L9qruhcQ3N18H49NkOq6otV9eQxPtfnquqSqroB+G/gd1V1ZlXdRdON/qS+/V3f1h1Gc+J8xDD7fA3wzar67/b7+i5wLk3Dqj/GS2iukL2sLXohcGNVLRraBHhckjWr6uqqOn+MzzNkD+BT7f/T3VX1+bb8qT3bfKyq/lRVV9Aksv+pqnOr6naaRu2T2hj/UlXHVNXNbd3BwFOSrD3B4/Z7f1XdUO2wkXF8z8PZBfhwVf2hqm4G3gG8Oslq7b63rqoTO+5rLD+k+Tm+mSb5/pQmkUqaGnvS/AF5VXsuejfwyiShyR0LaBoCd7XnoFs67nfY9ybZDNgO2KfNO1cDH2f0vPOFqnraGMf7bFVdWlXX0/Rs/7aqftSe877GffPOce358U7g/cAGNBfi+u0GnFxVZ7Tn7G8D5wMvGibG39H8kf/StuivgWur6v96NhvKO1dW1W/H+DxD9gSOqKpftOf/o4E1aBo6Q/6zqpZWc4/1z4CfVtV57fn/G9ybd+6qqmOrannP//XTkqw5weP2O6SqbuzJO12/5+HsAnyoqi6rqptoGmS7tD+XVNWjq+rkjvvqJMkjgacBX+wp/g5Nvts6yQOAA2n+hnjAZB5b97JBpYm4vPdFkq2SnNZ24d8EvIemITSS3uEQt9JcdRrOMcAZwIlJrkxzM+hcmis/awBD3f830lyNWdEbLq/tWb9tmNf3xJlkvyQXJFlGc2VtbYb/zJsBrxqKs411W5qrqsP5MvCqdv3VNA1O2hPzq2h6Qq5Jc4Ppozp+rs2At/XF8BCaq41DOn32djjDYe3Qj5uAi9ptRvrsYx23X//PVtfveTgPBS7reX0Z8Fc0fyxNmvZn8r9priY/oN3/g2mGWEiaZO0fp5sA3+45t/yK5m+aDYDP0fSsf60duvX+tJPEdDDSezej6dFY0nPMj7Fy887bk1zYcz5ck5HPvbv2nXsXMv68cwNNA+FNNHnnm0m6XtDaDHhHXwwLmFjemZvkwz155wIgNP/XEzluv/680/V7Hs5weWctmh7IqbIb8P2qunKooKq+BRwKfBO4BDiPZnTRFVMYx6xmg0oTUX2vj6LpWXlEVc0HDqI52a3YQar+XFUHV9VjgGcD/0Bzcr+cpiG2flWt2y7zq+oJI8Q3qZI8j2Yc9U40wwTWoxmqN9xnvpxmGMq6PcvaVfWhYbaF5o/yF6QZN/8ymkQHQFV9p6peQNMouYjme+/icuDdfTE8YIK9M6+hGT7yfOCB3NtbNNJnH+9x7/m/6/A9j/X/fBVNch2yKU1CmeyJITakSdafaH9ml9JcDHjJJB9HElBVRTP89/l955c12x6PO6rqoKraiqZX6R+5tydp1PPGKO+9nOb8s15f3hka+TDVeeeFwBtp8uC6NH+g38bI597PDpN3PjrC7k8AXtyO/ngp8JWhiqo6rap2oGko/JFmuF0XlwMHDXP+n0jvzOtoeteeR5N3tmrLR/rs4z1ub94Z63ueSN65Dbh+jPdNSHtx4Z+BY/vrquqjVfXwqnowcDpN/rtgKuKQDSpNjnVo7u+5Jc0EBXuOsX0nSZ6f5HHtEK2baIZi3F1Vl9NcQfxwkvlpbgp+RO6d7OFa4GFpJ8aYAusAd9GMcV6dZtjbcEPeoOmC/4ckL2x7d9ZM8rwkw14prKpraYbcfQG4sKp+D5DmZt2Xtl33fwZuAe7uGO/RwL8neWoa89p9jRTzaNahmUnoOpremNF6YVb0uGN9z9cCmw8NpRjGV4B9kmze3hR8CM1Nup1u9k0zLe3QkJK/GmF4yVAclwNvaK+krkfT8Dx3AvuS1M2RwAeTbAKQ5EFJXtquvyDJY3tyx13ce768llGGb4303qr6A829MoclWafNO4/MvY9BuBbYZIrzzp00F4T+imYkyEjnkWOBf0yyQ5t31mrXHzzcxm3Pxjk0F4LOq2b4OUk2TvK3bd65g6ZBOZ6888YkC3vO/3/f7mu81gFup8k7a9PcPztVxx3re74W2GKMvPPWJJu2eed9wJfbiwBjmkCueB5Nw++Uvv2sneQx7XewBU1D+MPVDH/XFLBBpcmwL02X8800vSYr/DyG1kOBk2mS2mKa4X9DV852pTmxnk/TJX8SzTAraK7E/J5mSOA1AEl2S3Iuk+PbbSy/p7lZ+iaam2Dvp6oupbnSdSDNCfqPNN/XaL97X6a5mfjLPWVzgP9oj3Md8Exg7y7BVtU5wBtoTqg30NygO9HnsHyB5grcVTT/JyM+OHgSjjvW93wCTcK7Pu1sS30+027zE5ohDzfT3HAOQDukY7SHMV5Mc2VxI+D7wG1tzyFpZhc7tf2cBexIc2V3afs5bwPe2mVfkibkMJrzww/SzDj7M2Cot2hjmntwbqYZPfFtYKhn/KPAa9I8M+6wYfY72ntfRfPH6wU0PQ4ncO+Qv+/SnKf+lOQKuGdm1l9MxocFTqV5/MLFNOezpYzQ2942iHaiuddoKc2wszczsbzzdpph+tfR3P/6xi7BVjNT3Zto/ia4kea8+Gom1pP3OZrPeg3N0LWzpvC4Y33PX6W5mHh9kuHy36dp/m75WbuP62lGWgCQ5OIkO41y/MtocsUGNBeObxtqCKeZPfCUvu13A06s+09X/wCan9vlNPf0fp/RG6JaQenYaJYkSZIk9bGHSpIkSZImyAaVJEmSJE2QDSpJkiRJmiAbVJKkVV6SvZMsSnJHkmN6yndJsrxnuTVJJXlKW39wkjv7tun6kE9J0iyw0hpUfcloeZK7k3yip36HNA/wvDXJmWmeSj5UlySHJrmuXQ4bZcpKSZL6XUUzy9Xnewur6viqmje0AHvRzO71y57NTujdZmhaaUmSAOaurAO1iQpo5senmcv/pPb1hjTTTO5OM2Xle2mmI922fcseNNMSb0Mz9eXpNAnvyNGOueGGG9bmm28+mR9DkjSNfvGLXyytqgXjfd/Qgz2TLARGm7J+N+C4rs+NGY05SJJWHaPln5XWoOrzCuBPNM+HAXg5sLiqhhpYBwNLk2xVVRfQJLjDq2ro2Q6HA69njAbV5ptvzqJFi6bmE0iSVrokl03hvjcDtgP+pa/qpUmup3kO2hFV9elR9rEHzUVANt10U3OQJK0iRss/03UPVf8VwK2Bex66WlW30DwQbevh6tv1rRlGkj3acfKLliwZ9pl3kiQN5zXAT6rqDz1lJwKPARbQXMg7KMmrRtpBVR1dVQurauGCBePuSJMkzUArvUGVZFNge+DYnuJ5wLK+TZcB64xQvwyYN9x9VCYzSdIEvYb75iaq6vyquqqq7q6qnwEfoxllIUkSMD09VK8Bzuq7ArgcmN+33Xzg5hHq5wPLJ2OMuyRJSZ4FPBT42hibFuCkSJKke0xXg+rYvrLFNBNOAPdMWvHwtvx+9e36YiRJ6iDJ3CRrAnOAOUnWTNJ7H/FuwNer6ua+970syXrtbLNPA94EfGPlRS5JGnQrtUGV5JnAxrSz+/U4BXhckp3ahHcQ8Ot2QgqA44B9kmyc5KHAvsAxKylsSdLMdwBwG7A/sGu7fgBAm3f+iftf7APYGbiIZsTEccChVTXcdpKkWWplz/K3G3By/xXAqlqSZCfgCOBLwDk0SWzIUcCWwHnt68+2ZZIkjamqDgYOHqHudmDdEepGnIBCkiRYyQ2qqtpzlLozgK1GqCtgv3aRJEmSpIEwXc+h0hg23/+06Q5h1rn0g3873SFI0rQz/0wPc5A0c03Xc6gkSZIkacazQSVJkiRJE2SDSpIkSZImyAaVJEmSJE2QDSpJkiRJmiAbVJIkSZI0QTaoJEmSJGmCbFBJkiRJ0gTZoJIkSZKkCbJBJUmSJEkTZINKkiRJkiaoU4MqyYIkC3pePz7J+5K8aupCkyRJkqTB1rWH6kTgpQBJNgR+DPwDcGSSfacoNkmSJEkaaF0bVE8Azm7XXwFcVFVbA68B9pyKwCRJkiRp0HVtUK0FLG/XXwB8s13/JbDJZAclSZIkSTNB1wbV74GXJ9kEeBHwvbZ8I+DGqQhMkiRJkgZd1wbVu4FDgUuBs6vqnLb8xcCvxnPAJDsn+W2SW5JcnOQ5bfkOSS5IcmuSM5Ns1vOeJDk0yXXtcliSjOe4kqTZK8neSRYluSPJMT3lmyepJMt7lgN76s0/kqRRze2yUVWdnGRT4KHAuT1VZwBf73qwJC+kaZi9Evg58JC2fEPgZGB34FTgvcAJwLbtW/cAdgS2AQo4HbgEOLLrsSVJs9pVwPtoLgSuNUz9ulV11zDl5h9J0qjG7KFKsnqSa4ANq+pXVfWXobqqOqeqLhjH8d4NvKeqzq6qv1TVlVV1JfByYHFVnVRVtwMHA9sk2ap9327A4VV1Rbv94cBrx3FcSdIsVlUnV9V/AdeN863mH0nSqMZsUFXVncCdNFfmJizJHGAhsCDJRUmuSHJEkrWArenp+aqqW4CL23L669v1rZEkaXJc1ualL7SjJoaMK/8k2aMdWrhoyZIlUxWrJGmAdL2H6hPA25N0GiI4go2A1WmmXX8O8ETgScABwDxgWd/2y4B12vX++mXAvOHGsZvMJEnjsBR4KrAZ8BSavHN8T33n/ANQVUdX1cKqWrhgwYIpClmSNEi6NpCeA2wPXJnkN8AtvZVV9fcd9nFb++8nqupqgCQfoWlQ/RiY37f9fODmdn15X/18YHlV3a/XrKqOBo4GWLhw4Qr1qkmSVm1VtRxY1L68NsnewNVJ5lfVTYwj/0iSZqeuDaqljGPyieFU1Q1JrmD4oYOLacapA5BkbeDhbflQ/TY0E1nQri9GkqTJNZSjhnqgzD+SpFF1neXvdZN0vC8Ab0zyXZr7st4CfAs4BfhQkp2A04CDgF/3THhxHLBPkm/TJLt9aYYhSpI0pnbI+lxgDjAnyZrAXTTD/G6ked7iesDHgR9W1dAwP/OPJGlUXe+hAiDJwiSvbHuQSLL2OO+rei/wv8DvgN/SPMPqkKpaAuwEHALcADwd2LnnfUfRTKd+HvAbmkbXUeOJXZI0qx1AM/R8f2DXdv0AYEvguzRDzH8D3AG8qud95h9J0qg6NYaSbAR8k+bG3QIeSfMcjo8AtwNv7rKfdsbAvdqlv+4MYKv7vampK2C/dpEkaVyq6mCaR3IM5yujvM/8I0kaVdceqo8C1wAbALf2lJ8EvGiyg5IkSZKkmaDrcL0dgB3aiSV6yy8GNp30qCRJkiRpBujaQ7UW8OdhyhfQDPmTJEmSpFmna4Pqx8Bre15XkjnA24DvT3ZQkiRJkjQTdB3ytx/woyRPBdYADge2Bh4IPGuKYpMkSZKkgdaph6qqzgceD/wM+B6wJs2EFE+qqounLjxJkiRJGlydnyFVVdcA75rCWCRJkiRpRun6HKrtRqgqmkkpLq6q6yctKkmSJEmaAbr2UP2QpvEEMDRveu/rvyT5JvDPVXXL5IUnSZIkSYOr6yx/fwv8FtgVeES77AosBnZqlycCH5yCGCVJkiRpIHXtoXof8Oaq6p0i/ZIkS4BDq+opSe4GPgG8cbKDlCRJkqRB1LWH6rHAlcOUX9nWAZwHPHgygpIkSZKkmaBrg+p84J1J1hgqaNff0dYBbAJcM7nhSZIkSdLg6jrkby/gVODKJL+hmZDi8cBfgL9rt9kS+NSkRyhJkiRJA6pTg6qqzkmyBc1EFI+mmdnvK8DxQ7P6VdVxUxalJEmSJA2g8TzY9xbgqCmMRZIkSZJmlM4NqiSbAM8BHkTfvVdV9ZFJjkuSJEmSBl6nBlWSXYDPA3cBS7j3ob606zaoJEmSJM06XWf5ew9wODC/qjavqi16li27HizJD5PcnmR5u1zYU7dDkguS3JrkzCSb9dQlyaFJrmuXw5Kk86eUJM1qSfZOsijJHUmO6SnfNsnpSa5PsiTJSUke0lN/cJI7e/LW8iSd854kadXXtUG1EfDZqrp7Eo65d1XNa5dHAyTZEDgZOBBYH1gEnNDznj2AHYFtgCfQzCy45yTEIkmaHa6ieUj95/vK1wOOBjYHNgNuBr7Qt80JPXlrXlVdMtXBSpJmjq73UH0beDowVUnk5cDiqjoJmiuCwNIkW1XVBcBuwOFVdUVbfzjweuDIKYpHkrQKqaqTAZIsBB7WU/6d3u2SHAH8aOVGJ0maybo2qE4HDk2yNXAecGdv5VCi6ugDST4IXAi8s6p+CGwNnNuzv1uSXNyWX9Bf365vPY5jSpLUxXbA4r6ylya5HrgaOKKqPj3Sm5PsQTOqgk033XTKgpQkDY6uDaqh6dLfMUxdAXM67udtwPnAn4GdgVOTPBGYRzPZRa9lwDrt+rz2dW/dvCSpqt4JMkxmkqQJSfIE4CDgZT3FJ9IMCbyWZqTG15PcWFVfGW4fVXV0uz0LFy6s4baRJK1aOt1DVVWrjbJ0bUxRVedU1c1VdUdVHQv8FHgJsByY37f5fJqx7AxTPx9Y3t+Yao9xdFUtrKqFCxYs6BqaJGkWS/II4DvAm6vqJ0PlVXV+VV1VVXdX1c+AjwGvmK44JUmDp+ukFFOlgNAMr9hmqDDJ2sDDuXfYxX3q2/X+IRmSJI1bO6vsGcB7q+qLY2w+lLckSQI6Nqjaacv3SrK4ndZ8y7Z8/yT/1HEf6yZ5cZI1k8xtn221HfDfwCnA45LslGRNmiEXv24npAA4DtgnycZJHgrsCxwzrk8qSZq12ryzJs0Q9Tk9uWhj4AfAJ6vqfhMdJXlZkvXaPPg04E3AN1Zu9JKkQda1h+rNwAE048J7r8xdCezdcR+r00xZuwRYCrwR2LGqLqyqJcBOwCHADTTj1Hfuee9RwKk0E2L8BjiNe+/rkiRpLAcAtwH7A7u26wcAuwNbAu/qfdZUz/t2Bi6iGYJ+HHBoO2RdkiSg+6QU/wa8vqpOS/K+nvJf0nG2vbbR9NRR6s8AthqhroD92kWSpHGpqoOBg0eofvco73vVVMQjSVp1dO2h2oymZ6jfncBakxeOJEmSJM0cXRtUlwBPHqb8JTTToEuSJEnSrNN1yN+HgSOSPIDmHqpnJPlnmiF4/zJVwUmSJEnSIOvUoKqqLySZC7wfeADwRZoJKd5UVSdMYXySJEmSNLC69lBRVZ8BPpNkQ2C1qvrT1IUlSZIkSYOv63OoVkuyGkBVLQVWS7J7kmdOaXSSJEmSNMC6TkpxGs1zo0gyD1gEfAj4UZLXTFFskiRJkjTQujaonkLzJHmAlwM3AQ8CXg+8dQrikiRJkqSB17VBtQ5wY7v+IuCUqrqTppH18KkITJIkSZIGXdcG1R+BZyVZG3gxcHpbvj5w61QEJkmSJEmDrussfx+hmSp9OXAZ8OO2fDvgvCmIS5IkSZIGXtfnUB2V5BfAJsDpVfWXtupi4MCpCk6SJEmSBtl4nkO1iGZ2PwCSrF5Vp01JVJIkSZI0A3R9DtWbkuzU8/pzwG1JLkzy6CmLTpIkSZIGWNdJKd4ELAFIsh3wT8Crgf8DDp+a0CRJkiRpsHUd8rcxcGm7/lLgpKo6Mcl5wE+mIjBJkiRJGnRde6huAha06y8Evt+u3wmsOdlBSZIkSdJM0LWH6nvAZ5L8CngE8J22fGvgD1MRmCRJkiQNuq49VP8O/BTYEHhFVV3flj8Z+Mp4D5rkkUluT/KlnrIdklyQ5NYkZybZrKcuSQ5Ncl27HJYk4z2uJGl2SrJ3kkVJ7khyTF+d+UeSNGFdn0N1E/DGYcrfNcHjfhL436EXSTYETgZ2B04F3gucAGzbbrIHsCOwDVDA6cAlwJETPL4kaXa5Cngf8GJgraFC848kaUV17aG6R5IHJ9m0dxnn+3cGbuTe+7AAXg4srqqTqup24GBgmyRbtfW7AYdX1RVVdSXNzIKvHW/skqTZqapOrqr/Aq7rqzL/SJJWSNfnUD0wybFJbgOupLlvqnfpJMl84D3Avn1VWwPnDr2oqluAi9vy+9W361sjSdKKmdT8k2SPdmjhoiVLlkxBuJKkQdO1h+rDNMMddgRup3kG1X8AVwCvHMfx3gt8rqou7yufByzrK1sGrDNC/TJg3nDj2E1mkqRxmLT8A1BVR1fVwqpauGDBguE2kSStYrrO8vc3wKuq6idJ7gZ+UVUnJLka2BP42lg7SPJE4AXAk4apXg7M7yubD9w8Qv18YHlVVf+Oqupo4GiAhQsX3q9ekqQek5Z/JEmzU9ceqnWBy9r1ZcAG7fr/AM/suI/nApsDf0xyDfBWYKckvwQW0/SAAZBkbeDhbTn99e36YiRJWjHmH0nSCunaoLoY2LJd/y2wczvc4eXA9SO+676OpklST2yXI4HTaGZcOgV4XJKdkqwJHAT8uqouaN97HLBPko2TPJTmHqxjOh5XkjTLJZnb5pc5wJwkayaZi/lHkrSCujaojgGe0K5/kGaY35+BDwGHdtlBVd1aVdcMLTTDKG6vqiVVtQTYCTgEuAF4OrBzz9uPopnO9jzgNzQNsaM6xi5J0gHAbcD+wK7t+gHmH0nSiur6HKqP9qz/IMljgKcAv6+q8yZy4Ko6uO/1GcBWI2xbwH7tIknSuLQ55+AR6sw/kqQJ6zopxX1U1WXce0+VJEmSNKNsvv9p0x3CrHTpB/92ukOYdJ0f7JtkxyQ/TrK0XX6S5OHt+4wAABV0SURBVB+mMjhJkiRJGmSdeqiS7Au8n+bm3GPa4mcAX05yYFV9eGrCk7Sq8wrhyrcqXh2UJGm6dB3y91Zg76r6TE/Z55P8HHgPzYN/JUmSJGlW6Trkbx5w5jDlZ7Z1kiRJkjTrdG1Q/RfwimHKdwK+OXnhSJIkSdLM0XXI30XA/kmeB/xPW7Ztu3wkyT5DG1bVRyY3REmSJEkaTF0bVK+leeDho9plyA3A63peF2CDSpIkSdKs0PXBvltMdSCSJEmSNNN0fg6VJEmSJOm+bFBJkiRJ0gTZoJIkSZKkCbJBJUmSJEkTNGKDKsnnk6zTrm+XpOuMgJIkSZI0K4zWQ7UrsHa7fiaw/tSHI0mSJEkzx2i9TpcCb0zyPSDAM5LcMNyGVfXjKYhNkiRJkgbaaA2q/wA+A7yd5oG9p4ywXQFzJjkuSZIkSRp4IzaoquobwDeSrAtcD2wN/GllBSZJkiRJg27MWf6q6kbgecDvq+q64ZauB0vypSRXJ7kpye+S7N5Tt0OSC5LcmuTMJJv11CXJoUmua5fDkmS8H1aSpH5Jlvctdyf5RFu3eZLqqz9wumOWJA2OTjP3VdWPkqyR5DXAY2mG+Z0PfLmq7hjH8T4A/GtV3ZFkK+CHSX4FXAacDOwOnAq8FzgB2LZ93x7AjsA27bFPBy4BjhzHsSVJup+qmje0nmRt4FrgpL7N1q2qu1ZqYJKkGaHTc6iSPBb4HfAR4Ok0DZ2PAr9L8piuB6uqxT0NsGqXhwMvBxZX1UlVdTtwMLBN2+gC2A04vKquqKorgcOB13Y9riRJHb2CZnj7T6Y7EEnSzND1wb4fA/4P2LSqnlNVzwE2Bc4F/nM8B0zyqSS3AhcAVwPfprk/69yhbarqFuDitpz++nZ9ayRJmly7AcdVVfWVX5bkiiRfSLLhSG9OskeSRUkWLVmyZGojlSQNhK4NqmcB76iqm4YK2vV3As8ezwGrai9gHeA5NMP87gDmAcv6Nl3Wbscw9cuAecPdR2UykyRNRJJNge2BY3uKlwJPBTYDnkKTl44faR9VdXRVLayqhQsWLJjKcCVJA6Jrg+p2YN1hyh/Y1o1LVd1dVWcBDwPeACwH5vdtNh+4uV3vr58PLB/mCqLJTJI0Ua8BzqqqPwwVVNXyqlpUVXdV1bXA3sCLkvTnLEnSLNW1QXUq8Jkkz0oyp12eDRwFfHMFjj+X5h6qxTQTTgD33BQ8VE5/fbu+GEmSJs9ruG/v1HCGLuQ506wkCejeoHoz8Huam3Rvb5cf0UxU8ZYuO0jyoCQ7J5nXNsheDLwK+AHNQ4Mfl2SnJGsCBwG/rqoL2rcfB+yTZOMkDwX2BY7pGLskSaNK8kxgY/pm90vy9CSPTrJakg2AjwM/rKr+YeqSpFmq67TpNwIvS/II4DE0V+bOr6qLxnGsohnedyRNQ+4y4C3tA4RJshNwBPAl4Bxg5573HgVsCZzXvv5sWyZJ0mTYDTi5qm7uK98SeD/wIOAmmsd2vGolxyZJGmCdGlRD2gbUeBpRve9dQnOz70j1ZwBbjVBXwH7tIknSpKqqPUco/wrwlZUcjiRpBuk65E+SJEmS1McGlSRJkiRNkA0qSZIkSZqgMRtUSeYm2audXU+SJEmS1BqzQVVVdwEfAlaf+nAkSZIkaeboOuTvbODJUxmIJEmSJM00XadN/wxweJLNgF8At/RWVtUvJzswSZIkSRp0XRtUX27//cgwdQXMmZxwJEmSJGnm6Nqg2mJKo5AkSZKkGahTg6qqLpvqQCRJkiRppun8HKokf5PkW0nOT7JJW7Z7kh2mLjxJkiRJGlydGlRJdgFOBH5PM/xvaAr1OcB+UxOaJEmSJA22rj1U+wGvr6r/B9zVU3428MRJj0qSJEmSZoCuDapHAv8zTPlyYP7khSNJkiRJM0fXBtVVwKOGKd8OuHjywpEkSZKkmaNrg+po4ONJntW+3iTJbsBhwKenJDJJkiRJGnBdp00/LMkDgdOBNYEzgTuAD1fVJ6cwPkmSJEkaWF0f7EtVvTPJIcBjaXq2zq+q5VMWmSRJkiQNuM7PoWoVcDtwK3D3eN6YZI0kn0tyWZKbk/wqyd/01O+Q5IIktyY5M8lmPXVJcmiS69rlsCQZZ+ySJA0ryQ+T3J5kebtc2FM3Yn6SJKnrc6jWSPKfwPXAucCvgeuTfCzJmh2PNRe4HNgeeCBwIHBiks2TbAic3JatDywCTuh57x7AjsA2wBOAvwP27HhcSZK62Luq5rXLowE65CdJ0izXdcjfp4EXAbtz7/TpzwA+AKwD/MtYO6iqW4CDe4q+leQPwFOADYDFVXUSQJKDgaVJtqqqC4DdgMOr6oq2/nDg9cCRHeOXJGkiXs7o+UmSNMt1HfL3j8Drqur4qrqkXY4H/hV4xUQOnGQjmqnYFwNb0/R8Afc0vi5uy+mvb9e3ZhhJ9kiyKMmiJUuWTCQ0SdLs9IEkS5P8NMlz27Kx8tN9mIMkafbp2qC6BbhymPIrgdvGe9AkqwPHA8e2V/jmAcv6NltG0/vFMPXLgHnD3UdVVUdX1cKqWrhgwYLxhiZJmp3eBmwJbEzzqJBTkzycsfPTfZiDJGn26dqg+gTwriRrDRW06we2dZ0lWQ34IvBnYO+2eDkwv2/T+cDNI9TPB5ZXVY3n2JIkDaeqzqmqm6vqjqo6Fvgp8BLGzk+SpFluxHuoknyzr+i5wJVJft2+fnz7/rW7HqztUfocsBHwkqq6s61aTHOf1NB2awMPb8uH6rcBft6+3qanTpKkyVZAGDs/SZJmudEmpbiu7/XX+17/YQLH+zTwGOAFVdU7VPAU4ENJdgJOAw4Cft1zw+9xwD5Jvk2T5PZlnD1jkiQNJ8m6wNOBHwF3Aa8EtgPeQjO77Wj5SZI0y43YoKqq103mgdrnduwJ3AFc03P7055VdXybrI4AvgScA+zc8/ajaMa2n9e+/mxbJknSilodeB+wFc0zFi8AdqyqCwHGyE+SpFmu67TpK6yqLqMZPjFS/Rk0yWy4ugL2axdJkiZNVS0BnjpK/Yj5SZKkTg2qJOvRPEPqecCD6JvMoqoeNOmRSZIkSdKA69pDdRzNMzeOBa6luY9JkiRJkma1rg2q5wLbV9UvpzAWSZIkSZpRuj6H6uJxbCtJkiRJs0LXRtKbgQ8k2SbJnKkMSJIkSZJmiq5D/i4C1gJ+CdAz5TkAVWUjS5IkSdKs07VB9RXggcCbcFIKSZIkSQK6N6gWAk+rqt9MZTCSJEmSNJN0vYfqfGD+VAYiSZIkSTNN1wbVAcBHkrwgyUZJ1u9dpjJASZIkSRpUXYf8fbv993vc9/6ptK+dlEKSJEnSrNO1QfW8KY1CkiRJkmagTg2qqvrRVAciSZIkSTNNpwZVkiePVl9Vv5yccCRJkiRp5ug65G8Rzb1SvU/07b2XynuoJEmSJM06XRtUW/S9Xh14EvBO4O2TGpEkSZIkzRBd76G6bJjii5IsA94FfGdSo5IkSZKkGaDrc6hG8gfgiZMRiCRJkiTNNJ0aVP0P8k2yQZLHAR8ALux6sCR7J1mU5I4kx/TV7ZDkgiS3JjkzyWY9dUlyaJLr2uWwJLnfASRJGqckayT5XJLLktyc5FdJ/qat2zxJJVnesxw43TFLkgZH13uolnLfSSigmaDicuCV4zjeVcD7gBcDa92zo2RD4GRgd+BU4L3ACcC27SZ7ADsC27RxnA5cAhw5jmNLkjScuTT5bHvgj8BLgBOTPL5nm3Wr6q7pCE6SNNgm+mDfvwBLgIvGk2Cq6mSAJAuBh/VUvRxYXFUntfUHA0uTbFVVFwC7AYdX1RVt/eHA67FBJUlaQVV1C3BwT9G3kvwBeArwi2kJSpI0YwzKg323Bs7tOd4tSS5uyy/or2/Xtx5uR0n2oOnRYtNNN52qeCVJq6gkGwGPAhb3FF+WZGiExH9U1dIR3msOkqRZZtR7qIa5d2rYZRLimAcs6ytbBqwzQv0yYN5w91FV1dFVtbCqFi5YsGASQpMkzRZJVgeOB45tR0gsBZ4KbEbTY7VOWz8sc5AkzT5j9VANd+9Uv+qwn7EsB+b3lc0Hbh6hfj6wvKrGik2SpE6SrAZ8EfgzsDdAVS2nebg9wLVJ9gauTjK/qm6ankglSYNkrIZQ/71Tvf4aeDMwGTfpLqa5TwqAJGsDD+fe4RaLaSak+Hn7ehvuOxRDkqQJa0c8fA7YCHhJVd05wqZDF/KcaVaSBIzRoBru3qkkTwYOBbYDjqKZka+TJHPbY84B5iRZk6ZBdgrwoSQ7AacBBwG/bodbABwH7JPk2zTJbF/gE12PK0nSGD4NPAZ4QVXdNlSY5OnAjcDvgfWAjwM/rKr+YeqSpFmq84N9k2yR5MvAOcD1wGOr6k1VtWQcxzsAuA3YH9i1XT+g3cdOwCHADcDTgZ173ncUzXTq5wG/oWl0HTWO40qSNKz2uYd70jyo/pqe503tAmwJfJdmCPpvgDuAV01bsJKkgTPmvU9JNqDpMfo34KfAM6pq0ejvGl5VHcx9p6btrTsD2GqEugL2axdJkiZNVV3G6EP4vrKyYpEkzTxjzfL3DuBimocdvqyqnj/RxpQkSZIkrWrG6qF6H82wvCuAvZLsNdxGVfX3kx2YJEmSJA26sRpUxzH2tOmSJEmSNCuNNcvfa1dSHJIkSZI043Se5U+SJEmSdF82qCRJkiRpgmxQSZIkSdIE2aCSJEmSpAmyQSVJkiRJE2SDSpIkSZImyAaVJEmSJE2QDSpJkiRJmiAbVJIkSZI0QTaoJEmSJGmCbFBJkiRJ0gTZoJIkSZKkCbJBJUmSJEkTZINKkiRJkiZoxjSokqyf5JQktyS5LMmrpzsmSdLsYA6SJI1k7nQHMA6fBP4MbAQ8ETgtyblVtXh6w5IkzQLmIEnSsGZED1WStYGdgAOranlVnQV8E/jn6Y1MkrSqMwdJkkYzU3qoHgXcXVW/6yk7F9i+f8MkewB7tC+XJ7lwJcSne20ILJ3uICYih053BJph/FmfHptNwzHNQTOHv5eaLfxZX/lGzD8zpUE1D1jWV7YMWKd/w6o6Gjh6ZQSl+0uyqKoWTncc0lTzZ31WMQfNEP5earbwZ32wzIghf8ByYH5f2Xzg5mmIRZI0u5iDJEkjmikNqt8Bc5M8sqdsG8CbgSVJU80cJEka0YxoUFXVLcDJwHuSrJ3kWcDLgC9Ob2QahkNdNFv4sz5LmINmFH8vNVv4sz5AUlXTHUMnSdYHPg+8ELgO2L+qvjy9UUmSZgNzkCRpJDOmQSVJkiRJg2ZGDPmTJEmSpEFkg0qSJEmSJsgGlSRJkiRNkA0qSeogyRpJDklySZJlbdmLkuw93bFJklZt5qDBZoNKkyLJ6kmek+SV7eu1k6w93XFJk+ijwOOAXYCh2XwWA2+YtogkAeYgzQrmoAHmLH9aYUkeD3wTuAN4WFXNS/ISYLeqeuX0RidNjiRXA4+oqluSXF9V67flN1bVutMcnjRrmYM0G5iDBps9VJoMnwYOqqqtgDvbsh8Bz56+kKRJ92dgbm9BkgU0zySSNH3MQZoNzEEDzAaVJsPWwJfa9QKoqluAtaYtImnynQQcm2QLgCQPAY4AvjqtUUkyB2k2MAcNMBtUmgyXAk/pLUjyNOCiaYlGmhrvoPlZPw9YF/g9cBXw7mmMSZI5SLODOWiAeQ+VVliSvwM+BxwJ7AscAvwb8Pqq+t50xiZNhXaYxdLyBCpNO3OQZhtz0OCxQaVJkeTJwO7AZsDlwGeq6hfTG5W0YpJs2WW7qrpkqmORNDJzkFZF5qCZwwaVJI0gyV9o7snIKJtVVc1ZSSFJkmYJc9DMYYNKE5LkPV22q6qDpjoWSdLsYg6SNEjmjr2JNKxNpjsASdKsZQ6SNDDsoZKkDpLMBfYCtgc2pGcIRlVtN11xSZJWfeagwea06Zo0SdZJskWSLYeW6Y5JmkQfBfYEfkwzRfPXgQcBP5jOoCQ1zEFaxZmDBpg9VFphSR4LHA9sw703Tw49XNEbJbVKSHIl8Iyq+mOSG6tq3SRbAUdV1fbTHZ80W5mDNBuYgwabPVSaDJ8CzgTWB24C1gOOAnabzqCkSfYAmumYAW5L8oCqugB40jTGJMkcpNnBHDTA7KHSCktyA/Cgqrqz56rJ2sBvqmqL6Y5PmgxJfga8pap+nuRU4Lc0f7ztUlWPmd7opNnLHKTZwBw02Oyh0mS4HVi9XV+aZFOan60Npi8kadK9GbirXd8HeDLwUmCPaYtIEpiDNDuYgwaYPVRaYUlOBL5dVcck+SDw9zQJ7o9VteP0RidJWpWZgyRNNxtUmlRJVgNeDcwDjquqW6c5JGnSJNkceALNz/c9qurL0xGPpPsyB2lVZg4aXDaotMKSPBB4E82Nkf2/5C+alqCkSZbk7cBBwGLgtp6q8hkg0vQxB2k2MAcNtrnTHYBWCScBc4BTuO8vubQq2Rd4SlWdP92BSLoPc5BmA3PQALNBpcmwLbBBVd053YFIU+g64NLpDkLS/ZiDNBuYgwaYs/xpMpwFOGWnVnVvAY5OsjDJpr3LdAcmzXLmIM0G5qAB5j1UWmFJHgR8GzgHuLa3rqreMy1BSZMsycuAzwAb9lVVVc2ZhpAkYQ7S7GAOGmwO+dNkOATYhKYren5Pua11rUo+BbwD+CrepyENEnOQZgNz0ACzh0orLMnNwKOq6urpjkWaKkmuBR5aVXdPdyyS7mUO0mxgDhps3kOlyXAJ4M3AWtV9GNg/SaY7EEn3YQ7SbGAOGmD2UGmFJXkr8HLgE9x//PoPpiUoaZIluRx4MPBnmtmW7lFV3hQsTRNzkGYDc9Bgs0GlFZbkDyNUVVVtuVKDkaZIku1HqquqH63MWCTdyxyk2cAcNNhsUEmSJEnSBHkPlSR1kGSNJIckuSTJsrbsRUn2nu7YJEmrNnPQYLNBJUndfBR4HLAL907HvBh4w7RFJEmaLcxBA8whf5LUQZKrgUdU1S1Jrq+q9dvyG6tq3WkOT5K0CjMHDTZ7qCSpmz/T9zD0JAvom21JkqQpYA4aYDaoJKmbk4Bjk2wBkOQhwBE0T62XJGkqmYMGmA0qSRpB382+RwGXAucB6wK/B64C3rPyI5MkrerMQTOH91BJ0giSLKuqB7brN1XV/HZ9AbC0PIFKkqaIOWjmmDv2JpI0a12c5HCamZRWT/I6IEOVSbNaVZ+fnvAkSaswc9AMYQ+VJI0gyaOA/YDNgOcBPxlms6qq56/UwCRJqzxz0Mxhg0qSOkjy/araYbrjkCTNPuagwWaDSpIkSZImyFn+JEmSJGmCbFBJkiRJ0gTZoJIkSZKkCbJBJUmSJEkTZINKkiRJkibo/wMx2PoCY+PrbgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Check the sex balance\n",
    "train_sex_value_counts = df_train.sex.value_counts()\n",
    "test_sex_value_counts = df_test.sex.value_counts()\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "train_sex_value_counts.plot.bar()\n",
    "train_sex_ratio = train_sex_value_counts['male']/train_sex_value_counts['female']\n",
    "plt.title(f'Train set: male vs female ratio: {train_sex_ratio:.2f}')\n",
    "plt.ylabel('Number of passengers')\n",
    "\n",
    "plt.subplot(122)\n",
    "test_sex_value_counts.plot.bar()\n",
    "test_sex_ratio = test_sex_value_counts['male']/test_sex_value_counts['female']\n",
    "plt.title(f'Test set: male vs female ratio: {test_sex_ratio:.2f}')\n",
    "\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, lets check that the relative number of passenger per class is similar between the train and test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.342397Z",
     "start_time": "2020-01-14T15:31:37.158616Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAEUCAYAAAAspncYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de7hddX3n8ffHhBrNRURSKghJRS0KJXSMtTOtl5Y6thYtks5MvLRotaFa2loYlWcewHihIzxlnLZoIeMFhdYiLXhD6xRFK22lxLFcMkZHVAQUe0CMCQFE/M4f63dkczgnZ2flnGzOOe/X8+zHvX6/tdb+7nVw/fLd67t+K1WFJEmSJGn3PWzUAUiSJEnSXGVCJUmSJEk9mVBJkiRJUk8mVJIkSZLUkwmVJEmSJPVkQiVJkiRJPZlQ6SEvyaIkO5IcMupYtGeSvCXJ+aOOQ5KkcUnemuSdo45Dc5cJlWZcS37GXz9MctfA8kt2d39VdV9VLauqb8xGvJNJ8sokn95bnydJ2vtmerwa2O/nkrx0JmNt+/3dJJfP9H4l7ZnFow5A809VLRt/n+TrwCurasoBIMniqvrB3ohNM8u/naS5bHfHK80Njk3a27xCpb2ulX1dlOT9SbYDL03y79svet9N8q0kf5Zkn7b+4iSVZHVbvrD1fzzJ9iT/nOQnp/isRyb5qyS3t33/S5L9W9++Sd7TPu/mJG9K8rAkPw2cAzyj/Up525Df68okZyTZnGRbkkuTPLr1PSzJ3yS5tcXx6SRPHtj2mCRfbN/n5iR/1Np/PMnH2jbfSfIPA9s8rn3GWJKvJfm9Ccf4/e1YbU9yfZJ/N9C/Nsm/tr6/TnJxko0D/S9Ick373CuTHDHQd3OS1ya5Dtg5xbH46SSXt5hvTfK6SdaZ8WMiSTMpXcn5aUm+muS2JH+ZZN/Wt7SdP7/TzkdXJXl0krOBpwHvbGPI2ZPsd9JtW99+Sd7Xzo03JXlDO1/+DPA/gWe3/d465Hf4XJI3J/l8G5v+NsmjWt/itvztFscVSX5qYNtfT7K1nYdvSvIHrf0nkvxd2+b2JJ8a2ObgJB9qx+urSX53oO+t7Ri+v+3z2iRHDfT/bBt7tqcbuy9JcupA/wvbNt9N8tkkTxnouzXJf02yBfjeFMdiTZJPJbmjrX/yJOvM+DHR/GdCpVF5IfBXwKOAi4AfAH8I7A/8PPArwAm72P7FwGnAfsA3gDdPsd7LgUcCjwMeA7wauLv1XQjcBRwKrAV+DXh5VV0HnAh8tpUajidgv5nk/0zzvX6rvQ4EArxtoO+jwBOBnwCuBy4Y6HsP8IqqWg4cCXymtb8W+Cqwsm13WotlUdvf1cBBwHOA1yY5emCfx7bP2Bf4OPBnbduHAx8E3kl3/P62rUvrfxrwv4BXtmP2buBDSX5sYN/rgV+l+/s9QBuoLwc+AjwWeBLw6SmO14wdE0maBa8F/iPwC3TjyL3cf15/JV2lz0F0Y9eJwPer6mS6c/Mr2xjyoH+0T7Vt6/tLYBvweOBn6c7Pv1lVXwBeA3y67fcnAJK8PMm/TPM9fgt4Sfu8HwMGk7wP042DPwFsBd470Pdu4Lfaefgo4LOt/fXAl1rsjwU2tlgWAR8D/oluHPwV4L8ledbAPl/Y9rsv8Em6JJEkS+jGpr8AHt3iev74Rkl+DngH3bj+GLrx4oNJBqut/gvdePiYiQegJayXA5e07/okYKof5GbsmGhhMKHSqFxZVR+pqh9W1V1VdXVVXVVVP6iqrwKbgGftYvu/qarNVXUv3eBz1BTr3Ut3cntCuxdrc1XtSHIQcDTwR1W1s6pupTupr5/qA6vqgqr6d1P1N++tqv9bVXcCpwPrk6R9z/OrantV3U13on1qkqUDcT4lyfKq+k5V/Z+B9gOBQ6rq+1U1nlT8HLCiqv64tX8FeNeE+D9TVZ+oqvvoBp7xY/TzwA+r6pyqureqLgY+P7DdBuAd7W9yX1W9u7U/bWCdP62qm6vqrkmOwQuAm6rqT6vqnqr6XlU9aLCfhWMiSTPtBOCUqvpmO0+9EfgvSUJ3LloJHNrGrqvbuX8Yk26bZBXwTOCkNjZ9i+7HsF2NTe+pqp+d5vPeU1Vbq2oH8AbgRW3bH1TVe6tqx8D3+9mW3ED3Y+fh7Tx8e0vqxuMfPA+PJya/ACypqjNb+5fpfhwbjP9TVfX3k4xNzwTurqpzW1x/DVwzsN0JwDlV9fk2Nm0CHg48dWCdt7W/1WRj07HAV9rYNz42XT3J8ZzpY6IFwIRKo3LT4EKSw5Jc1i7Bfw94E10iNJXBUoedwLIp1juf7hepDyS5pZUbLAZW0Z2Ixy/pfxd4O3BAv6/zI4Pf68b2GfulKxs5q5U/fA/4Sltn/Du+kC4R+Ua60rent/a3tv18MskNSV7b2lcBh4zH3uJ/Hd2vaeMmHqPxROVA4OZdxL0KeP2EfT+W7pfNydaf6OCB7zelWTgmkjRjWtJ0MPCxgXPhF+j+7fQYuh+xPgP8Tbqy5D9uV2iGMdW2q4AlwNjAZ/4pMz82PTLJo1p5258MnIe30lVXjF/hORZYR3ce/lSSta39DOCbwBVJvpLkpNa+Clg9Yfw4iV2PTePj9zBj03+bsO+V7N7YdMMu+oEflfzN5DHRAmBCpVGpCcvn0ZV8PaGqVtBd3ckef0j3K9HGqnoy3S9nL6Qre7iJ7kS+X1Xt214rqurIKeIb1sED7w8B7gG+Q1du8Tzgl+jK5J7Q1kmL86qqegHw43RlcH/d2r9XVX9UVavpTuKvb6UTNwH/byD2fatqeVU9n+l9i650Zaq4bwLeOGHfj6yqDwyss6vjcxNdqcR0ZvqYSNKMqaoCbgF+acL5cElV3daucpxeVYfRXV35T9x/JWaXY8gutr0J2AE8esLYNF4dMVNj086q2kZXPvcfgV+kOw8f1tYZPw//c1UdQ5fQ/W/g/a19W1X9YVWtoksuTk3y8y3+rZOMTS8cIsZhxqbTJxmbLhlYZybGppk+JloATKj0ULGcrmb8znQTE+zq/qmhJfmlJEckeRjdTar3AvdV1U10vw7+SZIV6W74fUKSZ7ZNvw08Lm1ijN3wW+1q21K6MoEPtEF5OV1ydTvdPV1nDMT4iCQvTrKiuhLG7cB9re/5SQ5tv5Rua+33Af8MfD/JyUmWtKs9P51ksPRhKlcCi5K8qv0St44HlkxsAn4vydPSWdbiWDr57h7kw3RXz05M8mPt+E5WjjLTx0SSZtq5wFuTHAw/mhTn+e39Lyd5ysD48gPuPxd9m+4eqElNtW1VfQ34HHBWkuVtbHpikl8Y2O/BPcamlyV5UpJldOXVF7X25XT3Fd9OV8XwloEYlyZZn2QF3dg5eB5+QZKfnOQ8fGXrf00bmxYnOTIDkyLtwj8Aj0iyoW33n4E1A/2bgN9PN6nS+Nj0giSPHPIYfBB4Qhv7xsemp02y3kwfEy0AJlR6qDgZOJ7u5HQe95/s99SBdDegfg/YQlf+9/7W91K6k+X/Be4ALub+soS/B/4fXUngrQBJjk8yWM89mQvoJrv4FrCI7gZi6GrIv9leW+hu2B10PHBjKy94BfCbrf2ngE/R/WL5j3T3Ll1Z3XSwz6O7YfnrwG10x23FdAekqu6hu1L3u+17/2e6m4jvaf1XAa+iuzH4DuDLdMdqKO1Xz+fQ/UL3b237ya4gzegxGTY+SdoNZ9GNG59KNyvtPwHjycFBwIfoxq3r6c6j41fy30b3A9sdSc6aZL+72vZFdBM2bKWrcLiI+0v+/o7unP9vSW4GSPKKJIP3wU7mArqx7xbgh3RjLnSlh2N0ZXjX0RKiAb9NVyK4ja6q4PjW/mS6yYa20yVCf1JVn2s/gD0P+A9tuzG6sWSqsvwfafc9HQf8Pt3YcyzwCe4fm/4R+AO6se67dGPLixnyql1V3UE3Nq2nG5u+RFe5MtGMHpNhYtPcl+7Hc0l7KsmVwDur6vxRx7K72mD8P6vqgmlXliTNGUk+RzeZw4WjjmV3tR8x31pV7592ZWmEvEIlLUBJnp3kgFZW8Qq6GvH/Peq4JEkLV5JfbGWV+yTZQHfP09+POi5pOounX0XSPPRkujKSpXSzHq2rqm+PNiRJ0gJ3ON3Y9Ei6mV+Pq6rbRhuSNL2hS/6SrKd7dsEhdHWlL6uqz6Z7kOjbW/tVrf3Gtk3opjh+ZdvNu4DXl3WGkiRJkuaBoUr+kjwHOJNuKsnldFN8fjXJ/nQ3/J8G7Ads5oGTCWygu6lwDXAkcAwzNHubJEmSJI3asPdQvRF4U5vB5YdVdUtV3UI3G8uWqrq4PU16I7Amyfic/ccDZ1fVzW39s4GXzexXkCQtZG2K/s1J7kly/hTrvCFJJfnlgbYkOTPJ7e11VquskCRpaNPeQ5Xuyd1rgQ8n+QrdE7w/CLyWrtb1R9NIV9WdSW5o7Vsn9rf3h0/3mfvvv3+tXr16+G8hSZrzPv/5z99WVSt7bPpNumfFPBd4xMTOJIcCv0H3OINBg1UURXfz+1fpnj20S45TkrSw7GqMGmZSigOAfegGo2fQPcjsQ8CpdM8VGJuw/ja6skBa/7YJfcuSZOJ9VG02lw0AhxxyCJs3bx4iNEnSfJHkxj7bVdUlbfu1wOMmWeUc4PXAOya0/6iKom1/NvA7DJFQrV692nFKkhaQXY1Rw5T83dX+98+r6ltttpX/Qffgth08+EGiK+geasYk/SuAHZNNSlFVm6pqbVWtXbmyzw+UkiQ9UJL/BHy/qj42SXevKgpJkgZNm1C1J0vfzORPot5CVyoBQJKldM8M2DJZf3u/BUmSZlmSZcAfA6+ZYpUpqyim2N+Gdq/W5rGxicUZkqSFathJKd4D/H572Nqj6QanjwKXAkckWZdkCXA6cG1VbW3bvQ84KclBSQ4ETgbOn9FvIEnS5N4IXFBVX5uif+gqCrCSQpI0uWETqjcDVwNfBr4IfAE4o6rGgHXAGcAdwNOB9QPbnQd8BLgOuB64rLVJkjTbjgb+IMmtSW4FDgY+kOT1rd8qCknSHhtmUgqq6l7g1e01se9y4LAHbdT1FfC69pIkacYlWUw3ni0CFrWKiR/QJVT7DKx6NXAS8PG2PF5F8TG6svaTgT/fW3FLkuaHoRIqSZIewk4F3jCw/FLgjVW1cXClJPcBd1TVjtZ0HvB4uioKgHdiFYUkaTeZUEmS5rSWOG0cYr3VE5atopAk7bFh76GSJEmSJE1gQiVJkiRJPVnyN4nVp1w26hBG6utv/bVRhyBJmoJjlGOUpIcWr1BJkiRJUk8mVJIkSZLUkwmVJEmSJPVkQiVJkiRJPZlQSZIkSVJPJlSSJEmS1JMJlSRJkiT1ZEIlSZIkST2ZUEmSJElSTyZUkiRJktSTCZUkSZIk9WRCJUmSJEk9mVBJkiRJUk8mVJIkSZLUkwmVJEmSJPVkQiVJkiRJPZlQSZIkSVJPJlSSJEmS1JMJlSRJkiT1ZEIlSZIkST2ZUEmS5rQkJybZnOSeJOcPtP9ckr9P8p0kY0kuTvLYgf4kOTPJ7e11VpKM5EtIkuYsEypJ0lz3TeAtwLsntD8a2ASsBlYB24H3DPRvAI4F1gBHAscAJ8xyrJKkeWbxqAOQJGlPVNUlAEnWAo8baP/44HpJzgE+M9B0PHB2Vd3c+s8Gfgc4d7ZjliTNH16hkiQtFM8EtgwsHw5cM7B8TWubVJINrbRw89jY2CyFKEmaa4ZKqJJ8OsndSXa015cG+o5OsjXJziRXJFk10Gd9uiRp5JIcCZwOvHageRmwbWB5G7BsqnGqqjZV1dqqWrty5crZC1aSNKfszhWqE6tqWXv9FECS/YFLgNOA/YDNwEUD21ifLkkaqSRPAD4O/GFVfXagawewYmB5BbCjqmpvxidJmtv2tOTvOGBLVV1cVXcDG4E1SQ5r/T+qT6+qW4CzgZft4WdKkjSUVjVxOfDmqrpgQvcWuh/8xq3hgSWBkiRNa3cSqv+e5LYk/5jk2a3tAfXnVXUncAP316APXZ9ubbokqY8ki5MsARYBi5IsaW0HAZ8C3l5Vk0008T7gpCQHJTkQOBk4f68FLkmaF4ZNqF4PPB44iG4K2o8kOZQH15/Tlpe390PXp1ubLknq6VTgLuAU4KXt/anAK+nGrjcM3AO8Y2C784CPANcB1wOXtTZJkoY21LTpVXXVwOJ7k7wIeB4Prj+nLW9v761PlyTNqqraSFdyPpk37mK7Al7XXpIk9dL3HqoCwoT68yRLgUO5vwbd+nRJkiRJ89a0CVWSfZM8d6Am/SV0z/L4BHApcESSda1+/XTg2qra2ja3Pl2SJEnSvDVMyd8+wFuAw4D7gK3AsVX1JYAk64BzgAuBq4D1A9ueR1e/fl1bfifWp0uSJEmaJ6ZNqKpqDHjaLvovp0u2JuuzPl2SJEnSvLWnz6GSJEmSpAXLhEqSJEmSejKhkiRJkqSeTKgkSZIkqScTKkmSJEnqyYRKkiRJknoyoZIkSZKknkyoJEmSJKknEypJkiRJ6smESpIkSZJ6MqGSJEmSpJ5MqCRJkiSpJxMqSZIkSerJhEqSJEmSejKhkiRJkqSeTKgkSZIkqScTKkmSJEnqyYRKkiRJknoyoZIkzWlJTkyyOck9Sc6f0Hd0kq1Jdia5Ismqgb4kOTPJ7e11VpLs9S8gSZrTTKgkSXPdN4G3AO8ebEyyP3AJcBqwH7AZuGhglQ3AscAa4EjgGOCEvRCvJGkeMaGSJM1pVXVJVX0QuH1C13HAlqq6uKruBjYCa5Ic1vqPB86uqpur6hbgbOBleylsSdI8YUIlSZqvDgeuGV+oqjuBG1r7g/rb+8ORJGk3mFBJkuarZcC2CW3bgOVT9G8Dlk11H1WSDe1erc1jY2MzHqwkaW4yoZIkzVc7gBUT2lYA26foXwHsqKqabGdVtamq1lbV2pUrV854sJKkucmESpI0X22hm3ACgCRLgUNb+4P62/stSJK0G0yoJElzWpLFSZYAi4BFSZYkWQxcChyRZF3rPx24tqq2tk3fB5yU5KAkBwInA+eP4CtIkuYwEypJ0lx3KnAXcArw0vb+1KoaA9YBZwB3AE8H1g9sdx7wEeA64HrgstYmSdLQFo86AEmS9kRVbaSbEn2yvsuBw6boK+B17SVJUi+7dYUqyROT3J3kwoE2n0IvSZIkaUHa3ZK/twNXjy/4FHpJkiRJC9nQCVWS9cB3gU8ONPsUekmSJEkL1lAJVZIVwJvoZkAa5FPoJUmSJC1Yw16hejPwrqq6aUL7jD2F3ifQS5IkSZprpk2okhwF/DLwtkm6Z+wp9D6BXpIkSdJcM8y06c8GVgPfaBeWltE9OPEpwLl090kBu3wK/b+0ZZ9CL0mSJGneGKbkbxNdknRUe51L9/DD5+JT6CVJkiQtYNNeoaqqncDO8eUkO4C72xPoSbIOOAe4ELiKBz+F/vF0T6EHeCc+hV6SJEnSPDFMyd8DtCfSDy77FHpJkiRJC9LuPthXkiRJktSYUEmSJElSTyZUkiRJktSTCZUkSZIk9WRCJUmSJEk9mVBJkiRJUk8mVJIkSZLUkwmVJEmSJPVkQiVJkiRJPZlQSZIkSVJPJlSSJEmS1JMJlSRJkiT1ZEIlSZIkST2ZUEmS5rUkq5N8LMkdSW5Nck6Sxa3v6CRbk+xMckWSVaOOV5I0t5hQSZLmu3cA/wY8FjgKeBbw6iT7A5cApwH7AZuBi0YVpCRpbjKhkiTNdz8JfKCq7q6qW4G/Aw4HjgO2VNXFVXU3sBFYk+Sw0YUqSZprTKgkSfPdnwLrkzwyyUHAr3J/UnXN+EpVdSdwQ2uXJGkoJlSSpPnuM3RJ0veAm+lK+z4ILAO2TVh3G7B8sp0k2ZBkc5LNY2NjsxiuJGkuMaGSJM1bSR4GfILuXqmlwP7Ao4EzgR3AigmbrAC2T7avqtpUVWurau3KlStnL2hJ0pxiQiVJms/2Aw4Gzqmqe6rqduA9wPOALcCa8RWTLAUObe2SJA3FhEqSNG9V1W3A14BXJVmcZF/geLp7py4FjkiyLskS4HTg2qraOrqIJUlzjQmVJGm+Ow74FWAM+ArwA+CPqmoMWAecAdwBPB1YP6ogJUlz0+JRByBJ0myqqn8Fnj1F3+WA06RLknrzCpUkSZIk9WRCJUmSJEk9mVBJkiRJUk8mVJIkSZLUkwmVJEmSJPU0VEKV5MIk30ryvSRfTvLKgb6jk2xNsjPJFUlWDfQlyZlJbm+vs5JkNr6IJEmSJO1tw16h+u/A6qpaAbwAeEuSpybZH7gEOI3uafSbgYsGttsAHEv3JPojgWOAE2YodkmSJEkaqaESqqraUlX3jC+216F0D0vcUlUXV9XdwEZgTZLxZ3ocD5xdVTdX1S3A2cDLZjB+SZIkSRqZoR/sm+QddMnQI4AvAB+je7r8NePrVNWdSW4ADge2tv+9ZmA317Q2SZIk7abVp1w26hBG6utv/bVRhyA9yNCTUlTVq4HlwDPoyvzuAZYB2yasuq2txyT924Blk91HlWRDks1JNo+NjQ3/DSRJkiRpRHZrlr+quq+qrgQeB7wK2AGsmLDaCmB7ez+xfwWwo6pqkn1vqqq1VbV25cqVuxOWJEmSJI1E32nTF9PdQ7WFbsIJAJIsHWhnYn97vwVJkiRJmgemTaiS/HiS9UmWJVmU5LnAi4BPAZcCRyRZl2QJcDpwbVVtbZu/DzgpyUFJDgROBs6flW8iSZIkSXvZMJNSFF1537l0CdiNwGuq6kMASdYB5wAXAlcB6we2PQ94PHBdW35na5MkSZKkOW/ahKqqxoBn7aL/cuCwKfoKeF17SZIkSdK80vceKkmSJEla8EyoJEmSJKknEypJkiRJ6smESpIkSZJ6MqGSJEmSpJ5MqCRJkiSpJxMqSZIkSerJhEqSJEmSejKhkiTNe0nWJ/likjuT3JDkGa396CRbk+xMckWSVaOOVZI0t5hQSZLmtSTPAc4EXg4sB54JfDXJ/sAlwGnAfsBm4KJRxSlJmpsWjzoASZJm2RuBN1XV59ryLQBJNgBbquritrwRuC3JYVW1dSSRSpLmHK9QSZLmrSSLgLXAyiRfSXJzknOSPAI4HLhmfN2quhO4obVPtq8NSTYn2Tw2NrY3wpckzQFeoZImWH3KZaMOYWS+/tZfG3UI0kw7ANgH+A3gGcC9wIeAU4FlwMTMaBtdWeCDVNUmYBPA2rVra5bilSTNMV6hkiTNZ3e1//3zqvpWVd0G/A/gecAOYMWE9VcA2/difJKkOc6ESpI0b1XVHcDNwGRXlLYAa8YXkiwFDm3tkiQNxYRKkjTfvQf4/SQ/nuTRwGuAjwKXAkckWZdkCXA6cK0TUkiSdocJlSRpvnszcDXwZeCLwBeAM6pqDFgHnAHcATwdWD+qICVJc5OTUkiS5rWquhd4dXtN7LscOGyvByVJmjdMqCRJkqQ5wJmIH5os+ZMkSZKknkyoJEmSJKknEypJkiRJ6smESpIkSZJ6MqGSJEmSpJ6c5U+SmoU8exI8tGdQkiTpocorVJIkSZLUkwmVJEmSJPVkQiVJkiRJPZlQSZIkSVJP0yZUSR6e5F1JbkyyPckXkvzqQP/RSbYm2ZnkiiSrBvqS5Mwkt7fXWUkyW19GkiRJkvamYa5QLQZuAp4FPAo4DfhAktVJ9gcuaW37AZuBiwa23QAcC6wBjgSOAU6YseglSZIkaYSmnTa9qu4ENg40fTTJ14CnAo8BtlTVxQBJNgK3JTmsqrYCxwNnV9XNrf9s4HeAc2fyS0iSJEnSKOz2PVRJDgCeBGwBDgeuGe9rydcNrZ2J/e394UwiyYYkm5NsHhsb292wJEmSJGmv262EKsk+wF8C721XoJYB2yastg1Y3t5P7N8GLJvsPqqq2lRVa6tq7cqVK3cnLEmSJEkaiaETqiQPAy4Avg+c2Jp3ACsmrLoC2D5F/wpgR1VVr2glSZIk6SFkqISqXVF6F3AAsK6q7m1dW+gmnBhfbylwaGt/UH97vwVJkiRJmgeGvUL1F8CTgedX1V0D7ZcCRyRZl2QJcDpwbSsHBHgfcFKSg5IcCJwMnD8zoUuSJEnSaA3zHKpVdFOdHwXcmmRHe72kqsaAdcAZwB3A04H1A5ufB3wEuA64HristUmSJEnSnDfMtOk3AlM+jLeqLgcOm6KvgNe1lyRJkiTNK7s9bbokSZIkqWNCJUlaEJI8McndSS4caDs6ydYkO5Nc0crcJUkamgmVJGmheDtw9fhCkv2BS4DTgP2AzcBFowlNkjRXmVBJkua9JOuB7wKfHGg+DthSVRdX1d3ARmBNkknvC5YkaTImVJKkeS3JCuBNdI/uGHQ4cM34QlXdCdzQ2iVJGooJlSRpvnsz8K6qumlC+zJg24S2bcDyyXaSZEOSzUk2j42NzUKYkqS5yIRKkjRvJTkK+GXgbZN07wBWTGhbAWyfbF9Vtamq1lbV2pUrV85soJKkOWva51BJkjSHPRtYDXwjCXRXpRYleQpwLnD8+IpJlgKHAlv2epSSpDnLK1SSpPlsE12SdFR7nQtcBjwXuBQ4Ism6JEuA04Frq2rrqIKVJM09XqGSJM1bVbUT2Dm+nGQHcHdVjbXldcA5wIXAVcD6UcQpSZq7TKgkSQtGVW2csHw54DTpkqTeLPmTJEmSpJ5MqCRJkiSpJxMqSZIkSerJhEqSJEmSejKhkiRJkqSeTKgkSZIkqScTKkmSJEnqyYRKkiRJknoyoZIkSZKknkyoJEmSJKknEypJkiRJ6smESpIkSZJ6MqGSJEmSpJ5MqCRJkiSpJxMqSZIkSerJhEqSJEmSejKhkiRJkqSehkqokpyYZHOSe5KcP6Hv6CRbk+xMckWSVQN9SXJmktvb66wkmeHvIEmSJEkjMewVqm8CbwHePdiYZH/gEuA0YD9gM3DRwCobgGOBNcCRwDHACXsWsiRJkiQ9NAyVUFXVJVX1QeD2CV3HAVuq6uKquhvYCKxJcljrPx44u6purqpbgLOBl81I5JIkSZI0Ynt6D9XhwDXjC1V1J3BDa39Qf3t/OJIkSZI0D+xpQrUM2DahbRuwfIr+bcCyye6jSrKh3ae1eWxsbA/DkiRJkqTZt6cJ1Q5gxYS2FcD2KfpXADuqqibuqKadqJ0AAAXbSURBVKo2VdXaqlq7cuXKPQxLkiRJkmbfniZUW+gmnAAgyVLg0Nb+oP72fguSJO0FSR6e5F1JbkyyPckXkvzqQP+UM9VKkjSMYadNX5xkCbAIWJRkSZLFwKXAEUnWtf7TgWuramvb9H3ASUkOSnIgcDJw/ox/C0mSJrcYuAl4FvAoullpP5Bk9RAz1UqSNK1hr1CdCtwFnAK8tL0/tarGgHXAGcAdwNOB9QPbnQd8BLgOuB64rLVJkjTrqurOqtpYVV+vqh9W1UeBrwFPZfqZaiVJmtbiYVaqqo10A81kfZcDkw4+7V6p17WXJEkjleQA4El05eevYsJMtUnGZ6rdOvkeJEl6oD29h0qSpDkhyT7AXwLvbaXp081UO3F7Z6OVJD2ICZUkad5L8jDgAuD7wImtebqZah/A2WglSZMxoZIkzWvt2YfvAg4A1lXVva1ruplqJUmalgmVJGm++wvgycDzq+qugfbpZqqVJGlaJlSSpHmrPVfqBOAo4NYkO9rrJUPMVCtJ0rSGmuVPkqS5qKpuBLKL/ilnqpUkaRheoZIkSZKknkyoJEmSJKknEypJkiRJ6smESpIkSZJ6MqGSJEmSpJ5MqCRJkiSpJxMqSZIkSerJhEqSJEmSejKhkiRJkqSeTKgkSZIkqScTKkmSJEnqyYRKkiRJknoyoZIkSZKknkyoJEmSJKknEypJkiRJ6smESpIkSZJ6MqGSJEmSpJ5MqCRJkiSpJxMqSZIkSerJhEqSJEmSejKhkiRJkqSeTKgkSZIkqScTKkmSJEnqadYTqiT7Jbk0yZ1Jbkzy4tn+TEmShuU4JUnaE4v3wme8Hfg+cABwFHBZkmuqaste+GxJkqbjOCVJ6m1Wr1AlWQqsA06rqh1VdSXwYeA3Z/NzJUkahuOUJGlPpapmb+fJzwD/VFWPGGj7r8Czqur5E9bdAGxoiz8FfGnWAnvo2x+4bdRBaCT82y9sC/3vv6qqVu7ND3Sc6mWh/3e60Pn3X7gW+t9+yjFqtkv+lgHbJrRtA5ZPXLGqNgGbZjmeOSHJ5qpaO+o4tPf5t1/Y/PuPhOPUbvK/04XNv//C5d9+arM9KcUOYMWEthXA9ln+XEmShuE4JUnaI7OdUH0ZWJzkiQNtawBv9JUkPRQ4TkmS9sisJlRVdSdwCfCmJEuT/Dzw68AFs/m588CCLylZwPzbL2z+/fcyx6le/O90YfPvv3D5t5/CrE5KAd3zPYB3A88BbgdOqaq/mtUPlSRpSI5TkqQ9MesJlSRJkiTNV7N9D5UkSZIkzVsmVJIkSZLUkwnViCX5mSS/keSRSRYlOTHJ25IcM+rYJM2eJIckeWGSJ03S96JRxCRNxnFKWpgcp4ZnQjVCSV4BfAz4M+AfgNcDh9M9aPL9SX57hOFphNo/Wk4fdRyaHUl+Bbge2Aj8a5J3JFk0sMp5IwlMmsBxSlNxnJrfHKd2j5NSjFCSrcALgABfBH6hqv6p9T0XOKuq1owwRI1IkocDO6tq0bQra85J8nng9Kq6LMkBwIXAPcBxVfX9JNuravloo5QcpzQ1x6n5zXFq95hQjVCSbVX1qPb+TmBZtT9IkocB36mqfUcZo2ZPknfvonsx8BIHqvlp8P/7bXkx3WC1P90/Xr/tQKWHAsephc1xauFynNo9lvyN1p1J9mnvz68HZrePAH44gpi097wYuAu4ZZLXzSOMS7PvjiQHjy9U1Q+AFwHfAC4H/AeKHiocpxY2x6mFy3FqNywedQAL3CeBJwBfrKrfm9B3DHDt3g9Je9F1wCeq6sMTO5IsAU7Z+yFpL7kceDnwpvGG9g/V305yLvBzowpMmsBxamFznFq4HKd2gyV/D1FJVtL9t3vbqGPR7Ejye8AtVfXBSfoWAadW1Rv3fmSabUl+DFhcVTun6D+kqr6xl8OSdovj1PznOLVwOU7tHhMqSZIkSerJe6gkSZIkqScTKkmSJEnqyYRKkiRJknoyoZIkSZKknkyoJEmSJKmn/w/RNSw0Q+/2EgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Check the class balance\n",
    "train_pclass_value_counts = df_train.pclass.value_counts()\n",
    "test_pclass_value_counts = df_test.pclass.value_counts()\n",
    "\n",
    "plt.figure(figsize=(12, 4))\n",
    "\n",
    "plt.subplot(121)\n",
    "plt.title('Train set: passenger class')\n",
    "train_pclass_value_counts.plot.bar()\n",
    "\n",
    "plt.subplot(122)\n",
    "plt.title('Test set: passenger class')\n",
    "test_pclass_value_counts.plot.bar()\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the above diagnostics, we are satisfied that, at least in these few categories, the train and test are similar enough, and we can move forward.\n",
    "\n",
    "## Feature engineering\n",
    "\n",
    "In this section we will use `vaex` to create meaningful features that will be used to train a classification model. To start with, let's get a high level overview of the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.431448Z",
     "start_time": "2020-01-14T15:31:37.344290Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pclass</th>\n",
       "      <th>survived</th>\n",
       "      <th>name</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>ticket</th>\n",
       "      <th>fare</th>\n",
       "      <th>cabin</th>\n",
       "      <th>embarked</th>\n",
       "      <th>boat</th>\n",
       "      <th>body</th>\n",
       "      <th>home_dest</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>dtype</th>\n",
       "      <td>int64</td>\n",
       "      <td>bool</td>\n",
       "      <td>str</td>\n",
       "      <td>str</td>\n",
       "      <td>float64</td>\n",
       "      <td>int64</td>\n",
       "      <td>int64</td>\n",
       "      <td>str</td>\n",
       "      <td>float64</td>\n",
       "      <td>str</td>\n",
       "      <td>str</td>\n",
       "      <td>str</td>\n",
       "      <td>float64</td>\n",
       "      <td>str</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>841</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1047</td>\n",
       "      <td>1046</td>\n",
       "      <td>233</td>\n",
       "      <td>1046</td>\n",
       "      <td>380</td>\n",
       "      <td>102</td>\n",
       "      <td>592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NA</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>206</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>814</td>\n",
       "      <td>1</td>\n",
       "      <td>667</td>\n",
       "      <td>945</td>\n",
       "      <td>455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.3075453677172875</td>\n",
       "      <td>0.3744030563514804</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>29.565299286563608</td>\n",
       "      <td>0.5100286532951289</td>\n",
       "      <td>0.3982808022922636</td>\n",
       "      <td>--</td>\n",
       "      <td>32.926091013384294</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>159.6764705882353</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>0.833269</td>\n",
       "      <td>0.483968</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>14.162</td>\n",
       "      <td>1.07131</td>\n",
       "      <td>0.890852</td>\n",
       "      <td>--</td>\n",
       "      <td>50.6783</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>96.2208</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>0.1667</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>--</td>\n",
       "      <td>0</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>1</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>3</td>\n",
       "      <td>True</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>80</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>--</td>\n",
       "      <td>512.329</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>327</td>\n",
       "      <td>--</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   pclass            survived  name   sex                 age  \\\n",
       "dtype               int64                bool   str   str             float64   \n",
       "count                1047                1047  1047  1047                 841   \n",
       "NA                      0                   0     0     0                 206   \n",
       "mean   2.3075453677172875  0.3744030563514804    --    --  29.565299286563608   \n",
       "std              0.833269            0.483968    --    --              14.162   \n",
       "min                     1               False    --    --              0.1667   \n",
       "max                     3                True    --    --                  80   \n",
       "\n",
       "                    sibsp               parch ticket                fare  \\\n",
       "dtype               int64               int64    str             float64   \n",
       "count                1047                1047   1047                1046   \n",
       "NA                      0                   0      0                   1   \n",
       "mean   0.5100286532951289  0.3982808022922636     --  32.926091013384294   \n",
       "std               1.07131            0.890852     --             50.6783   \n",
       "min                     0                   0     --                   0   \n",
       "max                     8                   9     --             512.329   \n",
       "\n",
       "      cabin embarked boat               body home_dest  \n",
       "dtype   str      str  str            float64       str  \n",
       "count   233     1046  380                102       592  \n",
       "NA      814        1  667                945       455  \n",
       "mean     --       --   --  159.6764705882353        --  \n",
       "std      --       --   --            96.2208        --  \n",
       "min      --       --   --                  1        --  \n",
       "max      --       --   --                327        --  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imputing\n",
    "\n",
    "We notice that there are 3 columns that have missing data, so our first task will be to impute the missing values with suitable substitutes. This is our strategy:\n",
    "\n",
    "- age: impute with the median age value\n",
    "- fare: impute with the mean fare of the 5 most common values.\n",
    "- cabin: impute with \"M\" for \"Missing\"\n",
    "- Embarked: Impute with with the most common value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.445583Z",
     "start_time": "2020-01-14T15:31:37.432706Z"
    }
   },
   "outputs": [],
   "source": [
    "# Handle missing values\n",
    "\n",
    "# Age - just do the mean of the training set for now\n",
    "median_age = df_train.percentile_approx(expression='age', percentage=50.0)\n",
    "df_train['age'] = df_train.age.fillna(value=median_age)\n",
    "\n",
    "# Fare: the mean of the 5 most common ticket prices.\n",
    "fill_fares = df_train.fare.value_counts(dropna=True)\n",
    "fill_fare = fill_fares.iloc[:5].index.values.mean()\n",
    "df_train['fare'] = df_train.fare.fillna(value=fill_fare)\n",
    "\n",
    "# Cabing: this is a string column so let's mark it as \"M\" for \"Missing\"\n",
    "df_train['cabin'] = df_train.cabin.fillna(value='M')\n",
    "\n",
    "# Embarked: Similar as for Cabin, let's mark the missing values with \"U\" for unknown\n",
    "fill_embarked = df_train.embarked.value_counts(dropna=True).index[0]\n",
    "df_train['embarked'] = df_train.embarked.fillna(value=fill_embarked)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### String processing\n",
    "\n",
    "Next up, let's engineer some new, more meaningful features out of the \"raw\" data that is present in the dataset. \n",
    "Starting with the name of the passengers, we are going to extract the titles, as well as we are going to count the number of words a name contains. These features can be a loose proxy to the age and status of the passengers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.468668Z",
     "start_time": "2020-01-14T15:31:37.447940Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = name_title\n",
       "Length: 1,047 dtype: str (column)\n",
       "---------------------------------\n",
       "   0      Mr\n",
       "   1      Mr\n",
       "   2     Mrs\n",
       "   3    Miss\n",
       "   4      Mr\n",
       "    ...     \n",
       "1042  Master\n",
       "1043     Mrs\n",
       "1044  Master\n",
       "1045      Mr\n",
       "1046      Mr"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = name_num_words\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  3\n",
       "   1  4\n",
       "   2  5\n",
       "   3  4\n",
       "   4  4\n",
       "  ...  \n",
       "1042  4\n",
       "1043  6\n",
       "1044  4\n",
       "1045  4\n",
       "1046  3"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Engineer features from the names\n",
    "\n",
    "# Titles\n",
    "df_train['name_title'] = df_train['name'].str.replace('.* ([A-Z][a-z]+)\\..*', \"\\\\1\", regex=True)\n",
    "display(df_train['name_title'])\n",
    "\n",
    "# Number of words in the name\n",
    "df_train['name_num_words'] = df_train['name'].str.count(\"[ ]+\", regex=True) + 1\n",
    "display(df_train['name_num_words'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the cabin colum, we will engineer 3 features:\n",
    " - \"deck\": extacting the deck on which the cabin is located, which is encoded in each cabin value;\n",
    " - \"multi_cabin: a boolean feature indicating whether a passenger is allocated more than one cabin\n",
    " - \"has_cabin\": since there were plenty of values in the original cabin column that had missing values, we are just going to build a feature which tells us whether a passenger had an assigned cabin or not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.505514Z",
     "start_time": "2020-01-14T15:31:37.470010Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = deck\n",
       "Length: 1,047 dtype: str (column)\n",
       "---------------------------------\n",
       "   0  M\n",
       "   1  B\n",
       "   2  M\n",
       "   3  M\n",
       "   4  M\n",
       "  ...  \n",
       "1042  M\n",
       "1043  M\n",
       "1044  M\n",
       "1045  B\n",
       "1046  M"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = multi_cabin\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  0\n",
       "   1  0\n",
       "   2  0\n",
       "   3  0\n",
       "   4  0\n",
       "  ...  \n",
       "1042  0\n",
       "1043  0\n",
       "1044  0\n",
       "1045  1\n",
       "1046  0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = has_cabin\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  1\n",
       "   1  1\n",
       "   2  1\n",
       "   3  1\n",
       "   4  1\n",
       "  ...  \n",
       "1042  1\n",
       "1043  1\n",
       "1044  1\n",
       "1045  1\n",
       "1046  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#  Extract the deck\n",
    "df_train['deck'] = df_train.cabin.str.slice(start=0, stop=1)\n",
    "display(df_train['deck'])\n",
    "\n",
    "# Passengers under which name have several rooms booked, these are all for 1st class passengers\n",
    "df_train['multi_cabin'] = ((df_train.cabin.str.count(pat='[A-Z]', regex=True) > 1) &\\\n",
    "                           ~(df_train.deck == 'F')).astype('int')\n",
    "display(df_train['multi_cabin'])\n",
    "\n",
    "# Out of these, cabin has the most missing values, so let's create a feature tracking if a passenger had a cabin\n",
    "df_train['has_cabin'] = df_train.cabin.notna().astype('int')\n",
    "display(df_train['has_cabin'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### More features\n",
    "\n",
    "There are two features that give an indication whether a passenger is travelling alone, or with a famly. \n",
    "These are the \"sibsp\" and \"parch\" columns that tell us the number of siblinds or spouses and the number of parents or children each passenger has on-board respectively. We are going to use this information to build two columns:\n",
    " - \"family_size\" the size of the family of each passenger;\n",
    " - \"is_alone\" an additional boolean feature which indicates whether a passenger is traveling without their family. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.529058Z",
     "start_time": "2020-01-14T15:31:37.506913Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Expression = family_size\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  1\n",
       "   1  1\n",
       "   2  3\n",
       "   3  4\n",
       "   4  1\n",
       "  ...  \n",
       "1042  8\n",
       "1043  2\n",
       "1044  3\n",
       "1045  2\n",
       "1046  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Expression = is_alone\n",
       "Length: 1,047 dtype: int64 (column)\n",
       "-----------------------------------\n",
       "   0  0\n",
       "   1  0\n",
       "   2  0\n",
       "   3  0\n",
       "   4  0\n",
       "  ...  \n",
       "1042  0\n",
       "1043  0\n",
       "1044  0\n",
       "1045  0\n",
       "1046  0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Size of family that are on board: passenger + number of siblings, spouses, parents, children. \n",
    "df_train['family_size'] = (df_train.sibsp + df_train.parch + 1)\n",
    "display(df_train['family_size'])\n",
    "\n",
    "# Whether or not a passenger is alone\n",
    "df_train['is_alone'] = (df_train.family_size == 0).astype('int')\n",
    "display(df_train['is_alone'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's create two new features:\n",
    " - age $\\times$  class\n",
    " - fare per family member, i.e. fare $/$ family_size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.533441Z",
     "start_time": "2020-01-14T15:31:37.530475Z"
    }
   },
   "outputs": [],
   "source": [
    "# Create new features\n",
    "df_train['age_times_class'] = df_train.age * df_train.pclass\n",
    "\n",
    "# fare per person in the family\n",
    "df_train['fare_per_family_member'] = df_train.fare / df_train.family_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling (part 1): gradient boosted trees\n",
    "\n",
    "Since this dataset contains a lot of categorical features, we will start with a tree based model. This we will gear the following feature pre-processing towards the use of tree-based models.\n",
    "\n",
    "### Feature pre-processing for boosted tree models\n",
    "\n",
    "The features \"sex\", \"embarked\", and \"deck\" can be simply label encoded. The feature \"name_tite\" contains certain a larger degree of cardinality, relative to the size of the training set, and in this case we will use the Frequency Encoder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.693870Z",
     "start_time": "2020-01-14T15:31:37.535185Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                          </th><th>sex   </th><th>age  </th><th>sibsp  </th><th>parch  </th><th>ticket   </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                           </th><th>name_title  </th><th>name_num_words  </th><th>deck  </th><th>multi_cabin  </th><th>has_cabin  </th><th>family_size  </th><th>is_alone  </th><th>age_times_class  </th><th>fare_per_family_member  </th><th>label_encoded_sex  </th><th>label_encoded_embarked  </th><th>label_encoded_deck  </th><th>frequency_encoded_name_title  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>3       </td><td>False     </td><td>Stoytcheff, Mr. Ilia                          </td><td>male  </td><td>19.0 </td><td>0      </td><td>0      </td><td>349205   </td><td>7.8958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>57.0             </td><td>7.8958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>False     </td><td>Payne, Mr. Vivian Ponsonby                    </td><td>male  </td><td>23.0 </td><td>0      </td><td>0      </td><td>12749    </td><td>93.5    </td><td>B24    </td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>23.0             </td><td>93.5                    </td><td>0                  </td><td>0                       </td><td>1                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>3       </td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)              </td><td>female</td><td>35.0 </td><td>1      </td><td>1      </td><td>C.A. 2673</td><td>20.25   </td><td>M      </td><td>S         </td><td>A     </td><td>nan   </td><td>East Providence, RI                 </td><td>Mrs         </td><td>5               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>105.0            </td><td>6.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>2       </td><td>True      </td><td>Hocking, Miss. Ellen \"Nellie\"                 </td><td>female</td><td>20.0 </td><td>2      </td><td>1      </td><td>29105    </td><td>23.0    </td><td>M      </td><td>S         </td><td>4     </td><td>nan   </td><td>Cornwall / Akron, OH                </td><td>Miss        </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>4            </td><td>0         </td><td>40.0             </td><td>5.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.20152817574021012           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>3       </td><td>False     </td><td>Nilsson, Mr. August Ferdinand                 </td><td>male  </td><td>21.0 </td><td>0      </td><td>0      </td><td>350410   </td><td>7.8542  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>63.0             </td><td>7.8542                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                           </td><td>...   </td><td>...  </td><td>...    </td><td>...    </td><td>...      </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                                 </td><td>...         </td><td>...             </td><td>...   </td><td>...          </td><td>...        </td><td>...          </td><td>...       </td><td>...              </td><td>...                     </td><td>...                </td><td>...                     </td><td>...                 </td><td>...                           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>3       </td><td>False     </td><td>Goodwin, Master. Sidney Leonard               </td><td>male  </td><td>1.0  </td><td>5      </td><td>2      </td><td>CA 2144  </td><td>46.9    </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Wiltshire, England Niagara Falls, NY</td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>8            </td><td>0         </td><td>3.0              </td><td>5.8625                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>3       </td><td>False     </td><td>Ahlin, Mrs. Johan (Johanna Persdotter Larsson)</td><td>female</td><td>40.0 </td><td>1      </td><td>0      </td><td>7546     </td><td>9.475   </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Sweden Akeley, MN                   </td><td>Mrs         </td><td>6               </td><td>M     </td><td>0            </td><td>1          </td><td>2            </td><td>0         </td><td>120.0            </td><td>4.7375                  </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>3       </td><td>True      </td><td>Johnson, Master. Harold Theodor               </td><td>male  </td><td>4.0  </td><td>1      </td><td>1      </td><td>347742   </td><td>11.1333 </td><td>M      </td><td>S         </td><td>15    </td><td>nan   </td><td>None                                </td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>12.0             </td><td>3.7111                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>1       </td><td>False     </td><td>Baxter, Mr. Quigg Edmond                      </td><td>male  </td><td>24.0 </td><td>0      </td><td>1      </td><td>PC 17558 </td><td>247.5208</td><td>B58 B60</td><td>C         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>1            </td><td>1          </td><td>2            </td><td>0         </td><td>24.0             </td><td>123.7604                </td><td>0                  </td><td>2                       </td><td>1                   </td><td>0.5787965616045845            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>3       </td><td>False     </td><td>Coleff, Mr. Satio                             </td><td>male  </td><td>24.0 </td><td>0      </td><td>0      </td><td>349209   </td><td>7.4958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>72.0             </td><td>7.4958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      pclass    survived    name                                            sex     age    sibsp    parch    ticket     fare      cabin    embarked    boat    body    home_dest                             name_title    name_num_words    deck    multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title\n",
       "0      3         False       Stoytcheff, Mr. Ilia                            male    19.0   0        0        349205     7.8958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           57.0               7.8958                    0                    0                         0                     0.5787965616045845\n",
       "1      1         False       Payne, Mr. Vivian Ponsonby                      male    23.0   0        0        12749      93.5      B24      S           None    nan     Montreal, PQ                          Mr            4                 B       0              1            1              0           23.0               93.5                      0                    0                         1                     0.5787965616045845\n",
       "2      3         True        Abbott, Mrs. Stanton (Rosa Hunt)                female  35.0   1        1        C.A. 2673  20.25     M        S           A       nan     East Providence, RI                   Mrs           5                 M       0              1            3              0           105.0              6.75                      1                    0                         0                     0.1451766953199618\n",
       "3      2         True        Hocking, Miss. Ellen \"Nellie\"                   female  20.0   2        1        29105      23.0      M        S           4       nan     Cornwall / Akron, OH                  Miss          4                 M       0              1            4              0           40.0               5.75                      1                    0                         0                     0.20152817574021012\n",
       "4      3         False       Nilsson, Mr. August Ferdinand                   male    21.0   0        0        350410     7.8542    M        S           None    nan     None                                  Mr            4                 M       0              1            1              0           63.0               7.8542                    0                    0                         0                     0.5787965616045845\n",
       "...    ...       ...         ...                                             ...     ...    ...      ...      ...        ...       ...      ...         ...     ...     ...                                   ...           ...               ...     ...            ...          ...            ...         ...                ...                       ...                  ...                       ...                   ...\n",
       "1,042  3         False       Goodwin, Master. Sidney Leonard                 male    1.0    5        2        CA 2144    46.9      M        S           None    nan     Wiltshire, England Niagara Falls, NY  Master        4                 M       0              1            8              0           3.0                5.8625                    0                    0                         0                     0.045845272206303724\n",
       "1,043  3         False       Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  female  40.0   1        0        7546       9.475     M        S           None    nan     Sweden Akeley, MN                     Mrs           6                 M       0              1            2              0           120.0              4.7375                    1                    0                         0                     0.1451766953199618\n",
       "1,044  3         True        Johnson, Master. Harold Theodor                 male    4.0    1        1        347742     11.1333   M        S           15      nan     None                                  Master        4                 M       0              1            3              0           12.0               3.7111                    0                    0                         0                     0.045845272206303724\n",
       "1,045  1         False       Baxter, Mr. Quigg Edmond                        male    24.0   0        1        PC 17558   247.5208  B58 B60  C           None    nan     Montreal, PQ                          Mr            4                 B       1              1            2              0           24.0               123.7604                  0                    2                         1                     0.5787965616045845\n",
       "1,046  3         False       Coleff, Mr. Satio                               male    24.0   0        0        349209     7.4958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           72.0               7.4958                    0                    0                         0                     0.5787965616045845"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "label_encoder = vaex.ml.LabelEncoder(features=['sex', 'embarked', 'deck'], allow_unseen=True)\n",
    "df_train = label_encoder.fit_transform(df_train)\n",
    "\n",
    "# While doing a transform, previously unseen values will be encoded as \"zero\".\n",
    "frequency_encoder = vaex.ml.FrequencyEncoder(features=['name_title'], unseen='zero')\n",
    "df_train = frequency_encoder.fit_transform(df_train)\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once all the categorical data is encoded, we can select the features we are going to use for training the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:37.756374Z",
     "start_time": "2020-01-14T15:31:37.695287Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  name_num_words</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  age</th><th style=\"text-align: right;\">   fare</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               3</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               57</td><td style=\"text-align: right;\">                  7.8958</td><td style=\"text-align: right;\">   19</td><td style=\"text-align: right;\"> 7.8958</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   1</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               23</td><td style=\"text-align: right;\">                 93.5   </td><td style=\"text-align: right;\">   23</td><td style=\"text-align: right;\">93.5   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               5</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            3</td><td style=\"text-align: right;\">              105</td><td style=\"text-align: right;\">                  6.75  </td><td style=\"text-align: right;\">   35</td><td style=\"text-align: right;\">20.25  </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.201528</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            4</td><td style=\"text-align: right;\">               40</td><td style=\"text-align: right;\">                  5.75  </td><td style=\"text-align: right;\">   20</td><td style=\"text-align: right;\">23     </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">               4</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">               63</td><td style=\"text-align: right;\">                  7.8542</td><td style=\"text-align: right;\">   21</td><td style=\"text-align: right;\"> 7.8542</td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    multi_cabin    name_num_words    has_cabin    is_alone    family_size    age_times_class    fare_per_family_member    age     fare\n",
       "  0                    0                         0                     0                        0.578797              0                 3            1           0              1                 57                    7.8958     19   7.8958\n",
       "  1                    0                         0                     1                        0.578797              0                 4            1           0              1                 23                   93.5        23  93.5\n",
       "  2                    1                         0                     0                        0.145177              0                 5            1           0              3                105                    6.75       35  20.25\n",
       "  3                    1                         0                     0                        0.201528              0                 4            1           0              4                 40                    5.75       20  23\n",
       "  4                    0                         0                     0                        0.578797              0                 4            1           0              1                 63                    7.8542     21   7.8542"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# features to use for the trainin of the boosting model\n",
    "encoded_features = df_train.get_column_names(regex='^freque|^label')\n",
    "features = encoded_features + ['multi_cabin', 'name_num_words', \n",
    "                               'has_cabin', 'is_alone', \n",
    "                               'family_size', 'age_times_class',\n",
    "                               'fare_per_family_member',\n",
    "                               'age', 'fare']\n",
    "\n",
    "# Preview the feature matrix\n",
    "df_train[features].head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Estimator: [xgboost](https://xgboost.readthedocs.io/en/latest/)\n",
    "\n",
    "Now let's feed this data into an a tree based estimator. In this example we will use [xgboost](https://xgboost.readthedocs.io/en/latest/). In principle, any algorithm that follows the [scikit-learn](https://scikit-learn.org/stable/) API convention, i.e. it contains the `.fit`, `.predict` methods is compatable with `vaex`. However, the data will be materialized, i.e. will be read into memory before it is passed on to the estimators. We are hard at work trying to make at least some of the estimators from [scikit-learn](https://scikit-learn.org/stable/) run out-of-core!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:38.928652Z",
     "start_time": "2020-01-14T15:31:37.757644Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>pclass  </th><th>survived  </th><th>name                                          </th><th>sex   </th><th>age  </th><th>sibsp  </th><th>parch  </th><th>ticket   </th><th>fare    </th><th>cabin  </th><th>embarked  </th><th>boat  </th><th>body  </th><th>home_dest                           </th><th>name_title  </th><th>name_num_words  </th><th>deck  </th><th>multi_cabin  </th><th>has_cabin  </th><th>family_size  </th><th>is_alone  </th><th>age_times_class  </th><th>fare_per_family_member  </th><th>label_encoded_sex  </th><th>label_encoded_embarked  </th><th>label_encoded_deck  </th><th>frequency_encoded_name_title  </th><th>prediction_xgb  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>3       </td><td>False     </td><td>Stoytcheff, Mr. Ilia                          </td><td>male  </td><td>19.0 </td><td>0      </td><td>0      </td><td>349205   </td><td>7.8958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>57.0             </td><td>7.8958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>1       </td><td>False     </td><td>Payne, Mr. Vivian Ponsonby                    </td><td>male  </td><td>23.0 </td><td>0      </td><td>0      </td><td>12749    </td><td>93.5    </td><td>B24    </td><td>S         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>23.0             </td><td>93.5                    </td><td>0                  </td><td>0                       </td><td>1                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>3       </td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)              </td><td>female</td><td>35.0 </td><td>1      </td><td>1      </td><td>C.A. 2673</td><td>20.25   </td><td>M      </td><td>S         </td><td>A     </td><td>nan   </td><td>East Providence, RI                 </td><td>Mrs         </td><td>5               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>105.0            </td><td>6.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>2       </td><td>True      </td><td>Hocking, Miss. Ellen \"Nellie\"                 </td><td>female</td><td>20.0 </td><td>2      </td><td>1      </td><td>29105    </td><td>23.0    </td><td>M      </td><td>S         </td><td>4     </td><td>nan   </td><td>Cornwall / Akron, OH                </td><td>Miss        </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>4            </td><td>0         </td><td>40.0             </td><td>5.75                    </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.20152817574021012           </td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>3       </td><td>False     </td><td>Nilsson, Mr. August Ferdinand                 </td><td>male  </td><td>21.0 </td><td>0      </td><td>0      </td><td>350410   </td><td>7.8542  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>63.0             </td><td>7.8542                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td>...                              </td><td>...     </td><td>...       </td><td>...                                           </td><td>...   </td><td>...  </td><td>...    </td><td>...    </td><td>...      </td><td>...     </td><td>...    </td><td>...       </td><td>...   </td><td>...   </td><td>...                                 </td><td>...         </td><td>...             </td><td>...   </td><td>...          </td><td>...        </td><td>...          </td><td>...       </td><td>...              </td><td>...                     </td><td>...                </td><td>...                     </td><td>...                 </td><td>...                           </td><td>...             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>3       </td><td>False     </td><td>Goodwin, Master. Sidney Leonard               </td><td>male  </td><td>1.0  </td><td>5      </td><td>2      </td><td>CA 2144  </td><td>46.9    </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Wiltshire, England Niagara Falls, NY</td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>8            </td><td>0         </td><td>3.0              </td><td>5.8625                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>3       </td><td>False     </td><td>Ahlin, Mrs. Johan (Johanna Persdotter Larsson)</td><td>female</td><td>40.0 </td><td>1      </td><td>0      </td><td>7546     </td><td>9.475   </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>Sweden Akeley, MN                   </td><td>Mrs         </td><td>6               </td><td>M     </td><td>0            </td><td>1          </td><td>2            </td><td>0         </td><td>120.0            </td><td>4.7375                  </td><td>1                  </td><td>0                       </td><td>0                   </td><td>0.1451766953199618            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>3       </td><td>True      </td><td>Johnson, Master. Harold Theodor               </td><td>male  </td><td>4.0  </td><td>1      </td><td>1      </td><td>347742   </td><td>11.1333 </td><td>M      </td><td>S         </td><td>15    </td><td>nan   </td><td>None                                </td><td>Master      </td><td>4               </td><td>M     </td><td>0            </td><td>1          </td><td>3            </td><td>0         </td><td>12.0             </td><td>3.7111                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.045845272206303724          </td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>1       </td><td>False     </td><td>Baxter, Mr. Quigg Edmond                      </td><td>male  </td><td>24.0 </td><td>0      </td><td>1      </td><td>PC 17558 </td><td>247.5208</td><td>B58 B60</td><td>C         </td><td>None  </td><td>nan   </td><td>Montreal, PQ                        </td><td>Mr          </td><td>4               </td><td>B     </td><td>1            </td><td>1          </td><td>2            </td><td>0         </td><td>24.0             </td><td>123.7604                </td><td>0                  </td><td>2                       </td><td>1                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>3       </td><td>False     </td><td>Coleff, Mr. Satio                             </td><td>male  </td><td>24.0 </td><td>0      </td><td>0      </td><td>349209   </td><td>7.4958  </td><td>M      </td><td>S         </td><td>None  </td><td>nan   </td><td>None                                </td><td>Mr          </td><td>3               </td><td>M     </td><td>0            </td><td>1          </td><td>1            </td><td>0         </td><td>72.0             </td><td>7.4958                  </td><td>0                  </td><td>0                       </td><td>0                   </td><td>0.5787965616045845            </td><td>False           </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      pclass    survived    name                                            sex     age    sibsp    parch    ticket     fare      cabin    embarked    boat    body    home_dest                             name_title    name_num_words    deck    multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title    prediction_xgb\n",
       "0      3         False       Stoytcheff, Mr. Ilia                            male    19.0   0        0        349205     7.8958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           57.0               7.8958                    0                    0                         0                     0.5787965616045845              False\n",
       "1      1         False       Payne, Mr. Vivian Ponsonby                      male    23.0   0        0        12749      93.5      B24      S           None    nan     Montreal, PQ                          Mr            4                 B       0              1            1              0           23.0               93.5                      0                    0                         1                     0.5787965616045845              False\n",
       "2      3         True        Abbott, Mrs. Stanton (Rosa Hunt)                female  35.0   1        1        C.A. 2673  20.25     M        S           A       nan     East Providence, RI                   Mrs           5                 M       0              1            3              0           105.0              6.75                      1                    0                         0                     0.1451766953199618              True\n",
       "3      2         True        Hocking, Miss. Ellen \"Nellie\"                   female  20.0   2        1        29105      23.0      M        S           4       nan     Cornwall / Akron, OH                  Miss          4                 M       0              1            4              0           40.0               5.75                      1                    0                         0                     0.20152817574021012             True\n",
       "4      3         False       Nilsson, Mr. August Ferdinand                   male    21.0   0        0        350410     7.8542    M        S           None    nan     None                                  Mr            4                 M       0              1            1              0           63.0               7.8542                    0                    0                         0                     0.5787965616045845              False\n",
       "...    ...       ...         ...                                             ...     ...    ...      ...      ...        ...       ...      ...         ...     ...     ...                                   ...           ...               ...     ...            ...          ...            ...         ...                ...                       ...                  ...                       ...                   ...                             ...\n",
       "1,042  3         False       Goodwin, Master. Sidney Leonard                 male    1.0    5        2        CA 2144    46.9      M        S           None    nan     Wiltshire, England Niagara Falls, NY  Master        4                 M       0              1            8              0           3.0                5.8625                    0                    0                         0                     0.045845272206303724            False\n",
       "1,043  3         False       Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  female  40.0   1        0        7546       9.475     M        S           None    nan     Sweden Akeley, MN                     Mrs           6                 M       0              1            2              0           120.0              4.7375                    1                    0                         0                     0.1451766953199618              False\n",
       "1,044  3         True        Johnson, Master. Harold Theodor                 male    4.0    1        1        347742     11.1333   M        S           15      nan     None                                  Master        4                 M       0              1            3              0           12.0               3.7111                    0                    0                         0                     0.045845272206303724            True\n",
       "1,045  1         False       Baxter, Mr. Quigg Edmond                        male    24.0   0        1        PC 17558   247.5208  B58 B60  C           None    nan     Montreal, PQ                          Mr            4                 B       1              1            2              0           24.0               123.7604                  0                    2                         1                     0.5787965616045845              False\n",
       "1,046  3         False       Coleff, Mr. Satio                               male    24.0   0        0        349209     7.4958    M        S           None    nan     None                                  Mr            3                 M       0              1            1              0           72.0               7.4958                    0                    0                         0                     0.5787965616045845              False"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import xgboost\n",
    "import vaex.ml.sklearn\n",
    "\n",
    "# Instantiate the xgboost model normally, using the scikit-learn API\n",
    "xgb_model = xgboost.sklearn.XGBClassifier(max_depth=11,\n",
    "                                          learning_rate=0.1, \n",
    "                                          n_estimators=500, \n",
    "                                          subsample=0.75, \n",
    "                                          colsample_bylevel=1, \n",
    "                                          colsample_bytree=1,\n",
    "                                          scale_pos_weight=1.5,\n",
    "                                          reg_lambda=1.5, \n",
    "                                          reg_alpha=5, \n",
    "                                          n_jobs=-1,\n",
    "                                          random_state=42,\n",
    "                                          verbosity=0)\n",
    "\n",
    "# Make it work with vaex (for the automagic pipeline and lazy predictions)\n",
    "vaex_xgb_model = vaex.ml.sklearn.Predictor(features=features,\n",
    "                                           target='survived',\n",
    "                                           model=xgb_model, \n",
    "                                           prediction_name='prediction_xgb')\n",
    "# Train the model\n",
    "vaex_xgb_model.fit(df_train)\n",
    "# Get the prediction of the model on the training data\n",
    "df_train = vaex_xgb_model.transform(df_train)\n",
    "\n",
    "# Preview the resulting train dataframe that contans the predictions\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that in the above cell block, we call `.transform` on the `vaex_xgb_model` object. This adds the \"prediction_xgb\" column as _virtual column_ in the output dataframe. This can be quite convenient when calculating various metrics and making diagnosic plots. Of course, one can call a `.predict` on the `vaex_xgb_model` object, which returns an in-memory `numpy` array object housing the predictions.\n",
    "\n",
    "### Performance on training set\n",
    "\n",
    "Anyway, let's see what the performance is of the model on the training set. First let's create a convenience function that will help us get multiple metrics at once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:38.935150Z",
     "start_time": "2020-01-14T15:31:38.930318Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import accuracy_score, f1_score, roc_auc_score\n",
    "def binary_metrics(y_true, y_pred):\n",
    "    acc = accuracy_score(y_true=y_true, y_pred=y_pred)\n",
    "    f1 = f1_score(y_true=y_true, y_pred=y_pred)\n",
    "    roc = roc_auc_score(y_true=y_true, y_score=y_pred)\n",
    "    print(f'Accuracy: {acc:.3f}')\n",
    "    print(f'f1 score: {f1:.3f}')\n",
    "    print(f'roc-auc: {roc:.3f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's check the performance of the model on the training set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:38.972402Z",
     "start_time": "2020-01-14T15:31:38.936933Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Metrics for the training set:\n",
      "Accuracy: 0.924\n",
      "f1 score: 0.896\n",
      "roc-auc: 0.914\n"
     ]
    }
   ],
   "source": [
    "print('Metrics for the training set:')\n",
    "binary_metrics(y_true=df_train.survived.values, y_pred=df_train.prediction_xgb.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Automatic pipelines\n",
    "\n",
    "Now, let's inspect the performance of the model on the test set. You probably noticed that, unlike when using other libraries, we did not bother to create a pipeline while doing all the cleaning, inputing, feature engineering and categorial encoding. Well, we did not _explicitly_ create a pipeline. In fact `veax` keeps track of all the changes one applies to a DataFrame in something called a state. A state is the place which contains all the informations regarding, for instance, the virtual columns we've created, which includes the newly engineered features, the categorically encoded columns, and even the model prediction! So all we need to do, is to extract the state from the training DataFrame, and apply it to the test DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:39.130730Z",
     "start_time": "2020-01-14T15:31:38.974124Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                                        </th><th>sex   </th><th style=\"text-align: right;\">   age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket          </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest               </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th>prediction_xgb  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>O'Connor, Mr. Patrick                       </td><td>male  </td><td style=\"text-align: right;\">28.032</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>366713          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           84.096</td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Canavan, Mr. Patrick                        </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>364858          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>Ireland Philadelphia, PA</td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Ovies y Rodriguez, Mr. Servando             </td><td>male  </td><td style=\"text-align: right;\">28.5  </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>PC 17562        </td><td style=\"text-align: right;\">27.7208</td><td>D43    </td><td>C         </td><td>None  </td><td style=\"text-align: right;\">   189</td><td>?Havana, Cuba           </td><td>Mr          </td><td style=\"text-align: right;\">               5</td><td>D     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           28.5  </td><td style=\"text-align: right;\">                 27.7208</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   4</td><td style=\"text-align: right;\">                      0.578797</td><td>True            </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Windelov, Mr. Einar                         </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>SOTON/OQ 3101317</td><td style=\"text-align: right;\"> 7.25  </td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.25  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Shelley, Mrs. William (Imanita Parrish Hall)</td><td>female</td><td style=\"text-align: right;\">25    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      1</td><td>230433          </td><td style=\"text-align: right;\">26     </td><td>M      </td><td>S         </td><td>12    </td><td style=\"text-align: right;\">   nan</td><td>Deer Lodge, MT          </td><td>Mrs         </td><td style=\"text-align: right;\">               6</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            2</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           50    </td><td style=\"text-align: right;\">                 13     </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td>True            </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                                          sex        age    sibsp    parch  ticket               fare  cabin    embarked    boat      body  home_dest                 name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title  prediction_xgb\n",
       "  0         3  False       O'Connor, Mr. Patrick                         male    28.032        0        0  366713             7.75    M        Q           None       nan  None                      Mr                           3  M                   0            1              1           0             84.096                    7.75                      0                         1                     0                        0.578797  False\n",
       "  1         3  False       Canavan, Mr. Patrick                          male    21            0        0  364858             7.75    M        Q           None       nan  Ireland Philadelphia, PA  Mr                           3  M                   0            1              1           0             63                        7.75                      0                         1                     0                        0.578797  False\n",
       "  2         1  False       Ovies y Rodriguez, Mr. Servando               male    28.5          0        0  PC 17562          27.7208  D43      C           None       189  ?Havana, Cuba             Mr                           5  D                   0            1              1           0             28.5                     27.7208                    0                         2                     4                        0.578797  True\n",
       "  3         3  False       Windelov, Mr. Einar                           male    21            0        0  SOTON/OQ 3101317   7.25    M        S           None       nan  None                      Mr                           3  M                   0            1              1           0             63                        7.25                      0                         0                     0                        0.578797  False\n",
       "  4         2  True        Shelley, Mrs. William (Imanita Parrish Hall)  female  25            0        1  230433            26       M        S           12         nan  Deer Lodge, MT            Mrs                          6  M                   0            1              2           0             50                       13                         1                         0                     0                        0.145177  True"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# state transfer to the test set\n",
    "state = df_train.state_get()\n",
    "df_test.state_set(state)\n",
    "\n",
    "# Preview of the \"transformed\" test set\n",
    "df_test.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that once we apply the state from the train to the test set, the test DataFrame contains all the features we created or modified in the training data, and even the predictions of the xgboost model!\n",
    "\n",
    "The state is a simple Python dictionary, which can be easily stored as JSON to disk, which makes it very easy to deploy.\n",
    "\n",
    "### Performance on test set\n",
    "\n",
    "Now it is trivial to check the model performance on the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:39.159323Z",
     "start_time": "2020-01-14T15:31:39.133147Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Metrics for the test set:\n",
      "Accuracy: 0.798\n",
      "f1 score: 0.744\n",
      "roc-auc: 0.785\n"
     ]
    }
   ],
   "source": [
    "print('Metrics for the test set:')\n",
    "binary_metrics(y_true=df_test.survived.values, y_pred=df_test.prediction_xgb.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature importance\n",
    "Let's now look at the feature importance of the `xgboost` model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:39.418941Z",
     "start_time": "2020-01-14T15:31:39.161295Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAIbCAYAAABLzPzHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOzdeZRdZZ318e8mzCSGWUwMSTejIjZqMfmKoKiIgQZlaBkEjDJo0wyLsbFb0RaFFgQ0aAARkEFAFEEDLdAYBBkrgECkVYZACBCGkImZZL9/nKfgcq3hVqVyKkX2Z61a3DrnGX7nVJZ313OeW8o2EREREXVZaqALiIiIiCVLwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiLeJiStLWmepCEDXUtEdxI+IiIWM5I+L+l2SS9Ierq8/qokddfP9mO2h9qeX1etEX2R8BERsRiRdARwOvA9YC3gncBBwP8Dlh3A0iL6jfIXTiMiFg+ShgNPAPvY/mUXbcYC3wbWAWYD59g+vpwbAzwCLGP7dUmTgJuAjwPvB24F9rT97CK9kIgeZOUjImLxsSWwHHBlN21eAPYBVgbGAl+RtHM37fcEvgisSbVycmT/lBrRdwkfERGLj9WBZ22/3nFA0i2SZkl6SdJHbU+yfZ/tBbbvBX4ObN3NmOfa/qvtl4DLgE0W7SVE9CzhIyJi8fEcsLqkpTsO2P6w7ZXLuaUkbS7p95KekTSbaj/I6t2M+VTD6xeBoYui8IjeSPiIiFh83Aq8AuzUTZuLgauAUbaHAxOAbj8FE7G4SfiIiFhM2J4FfBP4kaRdJQ2VtJSkTYCVSrNhwEzbL0vajGpPR8SgsnTPTSIioi62/1vSdOBo4GdUG0wfBo4BbgG+CpwiaTxwI9U+jpUHqNyIPslHbSMiIqJWeewSERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUat81DaiBauvvrrHjBkz0GVERAwqkydPftb2Gs3HEz4iWjBmzBja29sHuoyIiEFF0qOdHc9jl4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtVp6oAt4O5E0Ffiy7et7aGdgPdsP9mGOPvetk6TzgMdt/0edfReV+6bPZsyxEwe6jIiIWk09cewiGTcrHxEREVGrhI+IiIioVcLHIiBpM0m3Spol6UlJ4yUt29TsM5IelvSspO9JWqqh/zhJD0h6XtLvJI3u5fzLSTpZ0mOSZkiaIGmFcm4bSY9LOkLS06W+Lzb0XUHSKZIelTRb0s0Nff9Z0pRyXZMkvaeh3wck3SVprqRLgeWbatpB0j2l7y2S3t9q3y6ucXVJvy3jzZR0U8c9lDRC0i8lPSPpEUmHNPS7WtIpDd9fKumnXcxxgKR2Se3zX5zd842PiIiWJHwsGvOBw4HVgS2BbYGvNrX5LNAGfBDYCRgHIGln4Djgc8AawE3Az3s5/0nA+sAmwLrASODrDefXAoaX418CzpC0Sjl3MvAh4MPAqsDRwAJJ65c6Dit1XQ38RtKyJVj9Grig9PkFsEvHZJI+CPwUOBBYDTgTuKqEpG77duMI4PFSyzup7plLAPkN8KdyfdsCh0narvQbB3xB0scl7QVsChza2QS2z7LdZrttyIrDWygpIiJakfCxCNiebPs226/bnkr1Zrt1U7OTbM+0/RhwGrBHOX4g8F3bD9h+HfgOsEmrqx+SBOwPHF7Gn1vG+HxDs9eAb9l+zfbVwDxgg/LGPQ441PZ02/Nt32L7FeBfgIm2r7P9GlVIWYEqpGwBLAOcVsa8HLizYb79gTNt317GPB94pfTrqW9XXgPeBYwu/W6ybaowsYbtb9l+1fbDwNkd12/7KeAg4HzgdGCfco8iIqImCR+LgKT1yyOBpyTNoXrzX72p2bSG148CI8rr0cDp5XHCLGAmIKrf4luxBrAiMLlhjP8pxzs8V4JNhxeBoaXG5YGHOhl3RKkTANsLyjWMLOemlzf/xmvqMBo4oqOeUtOo0q+nvl35HvAgcG15fHVsw1wjmuY6jmp1pMNvgSHAX2zf3MJcERHRj/JR20Xjx8DdwB6250o6DNi1qc0oYEp5vTbwRHk9DTjB9kV9nPtZ4CVgI9vT+9D3ZWAdqscWjZ4ANu74pqywjAKmAwZGSlJDiFibN0NMxzWd0DyhpK176NupslpxBFWo2Qj4vaQ7y1yP2F6vm+4nAA8A/yBpD9s9PtbaeORw2hfRR84iIpY0WflYNIYBc4B5kjYEvtJJm6MkrSJpFNWeg0vL8QnAv5c3VCQNl7RbqxOXFYmzgVMlrVnGGNmw56Gnvj8Fvl82bQ6RtKWk5YDLgLGStpW0DNUb/yvALcCtwOvAIZKWlvQ5YLOGoc8GDpK0uSorSRoraVgLfTtVNrCuW0LQHKp9NvOBO4A5ko4pm2eHSHqfpE1Lv48CXwT2KV8/lNTqqlJERPSDhI9F40hgT2Au1RvvpZ20uRKYDNwDTATOAbB9BdWG0UvKI5v7ge17Of8xVI8kbitjXA9s0Iva76PadzGz1LKU7b8AewM/pFoh2RHYseyreJVqg+x+wPNU+0N+1TGg7XaqfR/jy/kHS1t66tuN9cp1zaMKMD+yPcn2/FLbJsAjpdafAMMlvQP4GXBw2dNyM9V9P7eEmIiIqIHe+qg9IjrT1tbm9vb2gS4jImJQkTTZdlvz8ax8RERERK0SPgap8se+5nXytddA19ZfJB3XxTVeM9C1RURE3+XTLoOU7Y0GuoZFzfZ3qD6mHBERbyNZ+YiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUaumeGkjaALgEWBf4mu0fLPKq4u9I2g/4su2P1Nl3cSdpHvB+2w93cX4q1bVfvzDz3Dd9NmOOnbgwQ0REdGvqiWMHuoTatLLycTQwyfawBI8YSJImSfpy4zHbQzuCh6TzJH17YKqLiIhWtRI+RgNTOjshaUj/lhMRERFvd92GD0k3AB8DxkuaJ+liST+WdLWkF4CPSVpO0smSHpM0Q9IESSs0jHGUpCclPSFpnCRLWrece8tvspL2k3Rzw/cbSrpO0kxJf5G0e8O58ySdIWmipLmSbpe0TsP5jRr6zpB0nKS1JL0oabWGdh+S9IykZXq4F+MkPSDpeUm/kzS64ZwlHSTpb+X8GZLUcH7/0neupD9L+mA5/p5yD2ZJmiLpnxv6rCbpKklzJN0BrNNUT3f3ptu+3Vxjl9chaR1JN0h6TtKzki6StHJD36nlZ32vpBcknSPpnZKuKdd9vaRVGtpvIemWcu1/krRND7WdAGzFm/8WxzfUvK6kA4C9gKPL+d90MsZSko6V9FC5jsskrdrKvYmIiP7Tbfiw/XHgJuBg20OBV4E9gROAYcDNwEnA+sAmVPtCRgJfB5D0aeBI4JPAesAnWi1M0krAdcDFwJrAHsCPJG3U0GwP4JvAKsCDpS4kDQOuB/4HGFHq+l/bTwGTgN0bxtgbuMT2a93UsjNwHPA5YI1yT37e1GwHYFPgn8r425W+uwHHA/sA7wD+GXiuhJ3fANeW6/s34CJVe2wAzgBeBt4FjCtfrd6bLvu2oNPrAAR8l+p+vgcYVa6r0S5UP+v1gR2Ba6ju2+pU/9YOKfWPBCYC3wZWpfo38ktJa3RVlO2v0fBv0fbBTefPAi4C/ruc37GTYQ4Bdga2LtfxPNW96pSkAyS1S2qf/+LsrppFREQv9eXTLlfa/qPtBcArwP7A4bZn2p4LfAf4fGm7O3Cu7fttv8Dfv1l1Zwdgqu1zbb9u+y7gl8CuDW1+ZfsO269TvfFs0tD3Kdun2H7Z9lzbt5dz51MFjo7HRnsAF/RQy4HAd20/UOb6DrBJ4+oHcKLtWbYfA37fUMuXqd4Q73TlQduPAlsAQ0u/V23fAPwW2KPUtQvwddsv2L6/1N3jvWmhb086vY5S93W2X7H9DPB9qjfxRj+0PcP2dKqgcLvtu22/AlwBfKC02xu42vbVthfYvg5oBz7Tizr74kCqTdOPl5qOp7pnnW68tn2W7TbbbUNWHL6IS4uIWHL0+GmXTkxreL0GsCIwufEpA9CxF2QEMLmh/aO9mGc0sLmkWQ3HluatQeGphtcvUr2ZQ/Vb+UNdjHslMEHSP1L9hj7b9h0t1HK6pFMajolqlafjmnpbywhgWglxHR4tY65Bda3Tms411tPVvempb086vQ5JawI/oHr0MYwquD7f1HdGw+uXOvm+456MBnaT1Lg6sQxV2FmURgNXSGq85/OBdwLTF/HcERFR9CV8uOH1s1RvKhuV33abPUn15tth7abzL1CFlw5rNbyeBtxo+5N9qHEa1YrG37H9sqTLqPYHbEjPqx4d451g+6I+1tLZnosngFGSlmoIIGsDfwWeAV6nunf/13CuccxO701Z+eiub199l+pn/37bz5VHUeP7ONY04ALb+/eynxfy/DRgnO0/9nLeiIjoR30JH2+wvUDS2cCpkg62/XR5nv8+278DLgPOlfQzYCrwjaYh7gE+J+knVCsBX+LN35Z/C5wo6QtUf2cEqkcA82w/0ENpvwW+L+kw4MfAssB7Gx69/Kx8rQl8rYVLnQD8l6R7bE+RNBz4lO1ftND3J6WWm4G7qILIa8DtVOHr6LKi8v+o9klsanu+pF8Bx0saB4wB9qW6hx3X1+W96aFvXw0DZgOzys/4qIUY60LgTknbUe3NWYbqMdSDth/vpt8M4B8X4vwE4ARJ+9p+tOwx+bDtK3sqeOORw2lfgj6DHxGxKPXHXzg9hmqz522S5lC9mWwAYPsa4DTghtLmhqa+p1JtYp1BtS/hjZWFsn/kU1T7R56gehxwErBcTwWVvp+kejN/Cvgb1ad2Os7/EVgA3GV7agvjXVHmvqRc4/3A9j31K31/QbUR9mJgLvBrYFXbr1JtPt2eagXpR8A+tjtWKw6mekzxFHAecG7T9XV3b7rsuxC+CXyQKoBMBH7V14FsTwN2otqM+gzVisRR9Pzv8XSqPRrPS+rsb86cA7y3fILm1130vwq4VtJc4DZg8z5eRkRE9JHsnlaq+3lCycB6th+sdeK/r+MG4GLbPxnIOmJwaGtrc3t7+0CXERExqEiabLut+fhCPXYZrCRtSvVb/E4DXUtERMSSZon7P5aTdD7Vo6HDyuOLjuMTyh+nav6aMHDV9i9JW3VxjfMGurYOXdUnaauBri0iIvpH7SsfttVzq0U6/75dHD8IOKjmcmpl+ybe/LjrYqn8MbuIiHgbW+JWPiIiImJgJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjkLSBpLslzZV0SD+Ou5ekaxu+t6R1+2v8hnHXljRP0pD+HjsiIvqfbA90DTHAJJ0DzLF9+CKex8B6th9clPMsCsu9az2/a9/TBrqMQWvqiWMHuoSIGACSJttuaz6elY8AGA1MGegiIiJiyZDwsYSTdAPwMWB8eXRxaHkEM0fSNEnHN7QdUx6dfLGce17SQZI2lXSvpFmSxje030/SzZ3MuamkGZKWbji2i6R7eqh1M0ntpbYZkr7fVNfSkrYs19Hx9bKkqaXdUpKOlfSQpOckXSZp1YW9hxER0TsJH0s42x8HbgIOtj0U+BOwD7AyMBb4iqSdm7ptDqwH/AtwGvA14BPARsDukrbuYc47geeATzYc3hu4oIdyTwdOt/0OYB3gsk7GvtX20HItqwC3AT8vpw8Bdga2BkYAzwNndDWZpANK2Gmf/+LsHkqLiIhWJXzEW9ieZPs+2wts30v1xt0cJv7L9su2rwVeAH5u+2nb06mCzAdamOp8qsBBWX3YDri4hz6vAetKWt32PNu39dD+B6W+r5XvDwS+Zvtx268AxwO7Nq7ANLJ9lu02221DVhzewiVFREQrEj7iLSRtLun3kp6RNBs4CFi9qdmMhtcvdfL90BamuhDYUdJQYHfgJttP9tDnS8D6wP9JulPSDt1cx4HANsCetheUw6OBK8rjoVnAA8B84J0t1BsREf0k4SOaXQxcBYyyPRyYAKi/JymrJLcCnwW+QM+PXLD9N9t7AGsCJwGXS1qpuZ2krYD/Anay3fi8ZBqwve2VG76WL7VERERNOl1ujiXaMGCm7ZclbQbsCVzbQ5+++hlwLGVFoqfGkvYGfmf7mbJyAdXKRWObUcClwD62/9o0xATgBEn72n5U0hrAh21f2dPcG48cTns+LhoR0S+y8hHNvgp8S9Jc4Ot0sqmzH11BCR62X2ih/aeBKZLmUW0+/bztl5vabAusRbUq0vGJl46PEZ9Otapzbbm+26g2z0ZERI3yR8ZiQEl6CDjQ9vUDXUt32tra3N7ePtBlREQMKvkjY7HYkbQLYOCGga4lIiLqk/ARA0LSJODHwL82fBoFSdc0/ZGwjq/jBqzYiIjoV9lwGgPC9jZdHN++5lIiIqJmWfmIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImq19EAXMJhImgp82fb1PbQzsJ7tB/swR5/71knSecDjtv+jrr79cW8kTQIutP2T3vS7b/psxhw7sa/Tvi1MPXHsQJcQEW8TWfmIiIiIWiV8RERERK0SPvpA0maSbpU0S9KTksZLWrap2WckPSzpWUnfk7RUQ/9xkh6Q9Lyk30ka3cv5l5N0sqTHJM2QNEHSCuXcNpIel3SEpKdLfV9s6LuCpFMkPSpptqSbG/r+s6Qp5bomSXpPQ78PSLpL0lxJlwLLN9W0g6R7St9bJL2/1b7dXOdRpf4nJI1r9R6U8zuVeuZIekjSpzsZ/12S7pV0ZCv1RERE/0j46Jv5wOHA6sCWwLbAV5vafBZoAz4I7ASMA5C0M3Ac8DlgDeAm4Oe9nP8kYH1gE2BdYCTw9YbzawHDy/EvAWdIWqWcOxn4EPBhYFXgaGCBpPVLHYeVuq4GfiNp2RKsfg1cUPr8AtilYzJJHwR+ChwIrAacCVxVAkK3fbtSwsKRwCeB9YBPtHoPJG0G/Aw4ClgZ+CgwtWn8McCNwHjbJ3dRwwGS2iW1z39xdk8lR0REixI++sD2ZNu32X7d9lSqN9utm5qdZHum7ceA04A9yvEDge/afsD268B3gE1aXf2QJGB/4PAy/twyxucbmr0GfMv2a7avBuYBG5TVl3HAoban255v+xbbrwD/Aky0fZ3t16hCygpUIWULYBngtDLm5cCdDfPtD5xp+/Yy5vnAK6VfT327sjtwru37bb8AHN+Le/Al4KflWhaUa/2/hrHfC0wCvmH7rK4KsH2W7TbbbUNWHN5CyRER0Yp82qUPyirB96lWNlakuo+Tm5pNa3j9KDCivB4NnC7plMYhqX5zf7SF6dcoc06u3oPf6D+koc1zJdh0eBEYSrVSszzwUCfjjmic3/YCSdNKXfOB6bbddE0dRgP7Svq3hmPLljHdQ9+ujOCt97SxT0/3YBTVyk1X9gIeBC5voY6IiOhnCR9982PgbmAP23MlHQbs2tRmFDClvF4beKK8ngacYPuiPs79LPASsJHt6X3o+zKwDvCnpnNPABt3fFNWF0YB06kCxEhJaggRa/NmiOm4phOaJ5S0dQ99u/Jkmb/D2k3X0d09mFausSvHA58GLpb0edvze6iFjUcOpz0fNY2I6Bd57NI3w4A5wDxJGwJf6aTNUZJWkTQKOBS4tByfAPy7pI0AJA2XtFurE9teAJwNnCppzTLGSEnbtdj3p8D3JY2QNETSlpKWAy4DxkraVtIywBFUj05uAW4FXgcOkbS0pM8BmzUMfTZwkKTNVVlJ0lhJw1ro25XLgP0kvVfSisA3enEPzgG+WK5lqXJuw4axXwN2A1YCLlDDZuCIiFj08j+6fXMksCcwl+pN8NJO2lxJ9djgHmAi1Rsitq+g2ix5iaQ5wP3A9r2c/xiqxwa3lTGuBzboRe33Ue27mFlqWcr2X4C9gR9SrSzsCOxo+1Xbr1JtkN0PeJ5qf8ivOga03U61B2N8Of9gaUtPfbti+xqqvTI3lPFuaPUe2L4D+CJwKjCbamPpW/bUNNS1JvDTBJCIiProrY/iI6IzbW1tbm9vH+gyIiIGFUmTbbc1H89vexEREVGrhI/FVPljX/M6+dproGvrL5KO6+Iarxno2iIiYtHJp10WU7Y3GugaFjXb36H6+xwREbEEycpHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHwGApAmS/nOg6+iOpG0kPT7QdURExMJZeqALiPpJ2g/4su2PdByzfdDAVbT4u2/6bMYcO3Ggy+iVqSeOHegSIiI6lZWPiIiIqFXCx2JC0rGSHpI0V9KfJX22HB8i6RRJz0p6RNLBkixp6XJ+uKRzJD0pabqkb0sa0s087wEmAFtKmidpVjl+nqRvl9fbSHpc0tGSni5j7yzpM5L+KmmmpOMaxlyqof7nJF0madVybnlJF5bjsyTdKemdPdyLVSWdK+kJSc9L+nVv7lk5t66kGyXNLvfu0nJckk4t1zVb0r2S3tfaTykiIvpDHrssPh4CtgKeAnYDLpS0LrATsD2wCfAC8IumfucDM4B1gZWA3wLTgDM7m8T2A5IOoumxSyfWApYHRgL7AWcD1wEfAtYGJku6xPbDwCHAzsDWwDPAD4AzgD2AfYHhwCjglXIdL/VwLy4A5gEblf9+uIt2nd4z208C/wVcC3wMWBZoK30+BXwUWB+YDWwIzOpscEkHAAcADHnHGj2UHBERrcrKx2LC9i9sP2F7ge1Lgb8BmwG7A6fbftz288CJHX3KCsL2wGG2X7D9NHAq8Pl+KOk14ATbrwGXAKuXOubangJMAd5f2h4IfK3U+ApwPLBrWZ15DVgNWNf2fNuTbc/palJJ7yrXdJDt522/ZvvGztp2c8866h8NjLD9su2bG44Powodsv1ACSudjX+W7TbbbUNWHN7CLYuIiFYkfCwmJO0j6Z7yaGIW8D6qN/wRVCsZHRpfjwaWAZ5s6HcmsGY/lPSc7fnldcdKxYyG8y8BQxvquKKhhgeA+cA7qVYxfgdcUh6j/LekZbqZdxQwswStbnVzzwCOBgTcIWmKpHEAtm8AxlOtzMyQdJakd/Q0V0RE9J+Ej8WApNFUjzUOBlazvTJwP9Wb55PAuxuaj2p4PY3qUcbqtlcuX++wvVEPU7r/qn+jju0baljZ9vK2p5eVi2/afi/V45MdgH16GGtVSSt3N2EP9wzbT9ne3/YIqpWZH5XHWNj+ge0PUT3WWR84amEuPiIieid7PhYPK1EFgmcAJH2R6rd4gMuAQyVNpNrzcUxHJ9tPSroWOKX8jY55wD8A7+7qUUUxA3i3pGVtv9oP9U8ATpC0r+1HJa0BfNj2lZI+BjwL/BmYQ/XYY35XA5VruoYqLPxruaYtbf+hqWl39wxJuwG32n4ceL60nS9pU6rQfRfV/Xy5u3o6bDxyOO356GpERL/IysdiwPafgVOAW6mCwcbAH8vps6k2Tt4L3A1cDbzOm2+Y+1BtqPwz1Zvs5cC7epjyBqo9G09JerYfLuF04CrgWklzgduAzcu5tUpNc6gex9wIXNjDeF+gCin/BzwNHNbcoId7BrApcLukeaW2Q20/AryD6p4+DzwKPAec3LvLjYiIhSG7v1fgY1GStD0wwfboga5lSdLW1ub29vaBLiMiYlCRNNl2W/PxrHws5iStUP6+xtKSRgLfAK4Y6LoiIiL6KuFj8Sfgm1SPCe6menTx9R47Vf9fLfM6+ZqwiOttSRe1zZO01UDXFhERi1Y2nC7mbL9ItX+ht/0OAhbb/78W20N7bhUREW9HWfmIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1GrQhg9JG0i6W9JcSYcMdD39SdI7Jf2hXNsp/Tz2VpL+0vD9VEmf6M856iLpeEkXDnQdERHRO0sPdAEL4Whgku0PDHQhi8ABwLPAO2y7Pwe2fROwQX+OuSS4b/psxhw7caDLaNnUE8cOdAkREV0atCsfwGhgSm87SRrwwKVKd/d+NPDn/g4e0bXF4d9FRMSSYlCGD0k3AB8DxkuaJ+nQ8ghmjqRpko5vaDtGkiV9SdJjwA3l+BaSbpE0S9KfJG3TwryTJH1X0h2SZku6UtKqDee7HLP0PUHSH4EXgX/sYo7zgH2Bo8u1fULSZpJuLeM+KWm8pGUb+hCgzQ4AACAASURBVFjSVyX9rTyq+S9J65Q+cyRd1tFe0jaSHu9k3rUkvShptYZjH5L0jKRlurkn+0n6o6RTS30PS/pwOT5N0tOS9m1ov5ykkyU9JmmGpAmSVmisTdLRpd+TknaW9BlJf5U0U9JxTSUsL+nSct13SfqnhrlGSPpluYZHGh/PlUc2l0u6UNIcYL+urjEiIvrXoAwftj8O3AQcbHso8CdgH2BlYCzwFUk7N3XbGngPsJ2kkcBE4NvAqsCRwC8lrdHC9PsA44ARwOvADwBaHPMLVI9UhgGPdnFt+wEXAf9te6jt64H5wOHA6sCWwLbAV5u6fhr4ELAF1SOps4C9gFHA+4A9urso208Bk4DdGw7vDVxi+7Xu+gKbA/cCqwEXA5cAmwLrljHGSxpa2p4ErA9sUs6PBL7eMNZawPINx88uY3wI2Ar4uqTG4LYT8Auqe34x8GtJy5SVpd9Q/dsYSXXPDpO0XVPfy6n+3VzUwzVGREQ/GZTho5ntSbbvs73A9r3Az6nCRqPjbb9g+yWqN7OrbV9d+lwHtAOfaWG6C2zfb/sF4D+B3SUNaXHM82xPsf16C2/ojdc32fZtpd9U4MxOru8k23NsTwHuB661/bDt2cA1QCt7Y84v10G5pj2AC1ro94jtc23PBy6lCjzfsv2K7WuBV4F1JQnYHzjc9kzbc4HvAJ9vGOs14IRyfy6hClyn255brm0K8P6G9pNtX17af58quGxBFX7WsP0t26/afpgqyDTOdavtX5ef10vNFyXpAEntktrnvzi7hdsQERGteFs855a0OXAi1W/4ywLLUf023Ghaw+vRwG6Sdmw4tgzw+xamaxzn0dJv9RbHbOzbMknrU72xtgErUv3cJjc1m9Hw+qVOvl+rhamuBCaUlYX1gdm272ihX/Nc2G4+NhRYo9Q/ucohAAgY0tD2uRJi3hirk/GHNnz/xj21vaA8UhoBGBghaVZD2yFUK2Z/17czts+iWkFiuXetl/03ERH95G0RPqiW28cD29t+WdJpVIGgUeObxzSqFYz9+zDXqIbXa1P9pv5si2P29Q3sx8DdwB6250o6DNi1j2N1qdy7y6ge12xIa6sevfEsVXjYyPb0fhrzjZ9HedTybuAJqkdij9her5u+CRQREQPg7RI+hgEzy5vnZsCewLXdtL8QuLM8/7+eaoViC+BB23+3GbPJ3pJ+BkwFvgVcbnu+qr830dcxezIMmAPMk7Qh8BXgmYUcsys/K19rAl/rz4HLysTZwKmSDrb9dNkr8z7bv+vjsB+S9DngKuAQ4BXgNmABMEfSMVT7cl6l2vOzgu07ezvJxiOH056Pr0ZE9Iu3xZ4Pqs2X35I0l2qT4mXdNbY9jWqz4XFUb+LTgKNo7X5cAJwHPEW1v+CQfhizJ0dSBaq5VPsWLu2HMTtl+49Ub9x3lf0l/e0Y4EHgtvIpk+tZuL87ciXwL8DzVBt6P2f7tfLoZkeqja2PUK26/AQYvhBzRUREP1D+lETrJE0CLrT9k4GuZVFS9VHmi9/u19kbbW1tbm9vH+gyIiIGFUmTbbc1H3+7PHaJfiJpU+CDVKs4ERER/e7t8til35Q/7NXZ11aDcZ5e1nQ+1WOQw8rHYDuOT+ii1gkDVWtERAxeWfloUv5oWVe2qWmeAWF73y6OHwQcVHM5ERHxNpWVj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUaulB7qAVkmaCnzZ9vU9tDOwnu0H+zBHn/vWSdJ5wOO2/6POvn0haQzwCLCM7df7acx++TlJ2o/q39RHemp73/TZjDl24sJMV5upJ44d6BIiIrqVlY+IiIioVcJHDBqSBs1KXUREdG3QhQ9Jm0m6VdIsSU9KGi9p2aZmn5H0sKRnJX1P0lIN/cdJekDS85J+J2l0L+dfTtLJkh6TNEPSBEkrlHPbSHpc0hGSni71fbGh7wqSTpH0qKTZkm5u6PvPkqaU65ok6T0N/T4g6S5JcyVdCizfVNMOku4pfW+R9P5W+3Zznd2NOVXSUZLulfSCpHMkvVPSNWWe6yWt0jTkOElPlHtyRMNY3f48JVnSv0r6G/C3Tur8iKRpkj5Wvt9Q0nWSZkr6i6TdG9quJukqSXMk3QGs08M9OEBSu6T2+S/ObuW2RURECwZd+ADmA4cDqwNbAtsCX21q81mgDfggsBMwDkDSzsBxwOeANYCbgJ/3cv6TgPWBTYB1gZHA1xvOrwUML8e/BJzR8EZ8MvAh4MPAqsDRwAJJ65c6Dit1XQ38RtKy5Y3418AFpc8vgF06JpP0QeCnwIHAasCZwFUlJHXbtyvdjdnQbBfgk+Ve7AhcQ3VvV6f6d3VI07AfA9YDPgUcK+kT5XgrP8+dgc2B9zbVuR3VfdvF9u8lrQRcB1wMrAnsAfxI0kalyxnAy8C7qP5NjOvuPtg+y3ab7bYhKw7vrmlERPTCoAsftifbvs3267anUr0xbt3U7CTbM20/BpxG9SYE1Zvpd20/UDY/fgfYpNXVD0kC9gcOL+PPLWN8vqHZa8C3bL9m+2pgHrBBWX0ZBxxqe7rt+bZvsf0K8C/ARNvX2X6NKqSsQBVStgCWAU4rY14O3Nkw3/7AmbZvL2OeD7xS+vXUtyvdjdnhh7Zn2J5OFeJut313uZ4rgA80jflN2y/Yvg84l/IzafHn+d1yv19qOLYbcBbwGdt3lGM7AFNtn1vGuwv4JbCrpCFUgenrpY77gfNbuBcREdHPBt0z9LJK8H2qlY0Vqa5hclOzaQ2vHwVGlNejgdMlndI4JNUqxaMtTL9GmXNylUPe6D+koc1zTZ/qeBEYSvWb/fLAQ52MO6JxftsLJE0rdc0Hptt20zV1GA3sK+nfGo4tW8Z0D3270t2YHWY0vH6pk++HNo3Z/DPZGPr08+xwGPCzEmYa695c0qyGY0tTrfysUV431xERETUbdOED+DFwN7CH7bmSDgN2bWozCphSXq8NPFFeTwNOsH1RH+d+luqNdaPyG39v+75Mtc/gT03nnqC8GcMbKyyjgOlUAWKkJDWEiLV5M8R0XNMJzRNK2rqHvl3pcsyFMAr4v4YaOn4mrfw8zd/bDThH0nTbpzXUfaPtTzY3Lisfr3dSR0s2Hjmc9nyENSKiXwy6xy7AMGAOME/ShsBXOmlzlKRVJI0CDgUuLccnAP/esQdA0nBJu7U6se0FwNnAqZLWLGOMLHsPWun7U+D7kkZIGiJpy7KP4jJgrKRtJS0DHEH1mOMW4FaqN81DJC0t6XPAZg1Dnw0cJGlzVVaSNFbSsBb6dqW7MfvqPyWtWO79F3nzZ9LKz7MzT1DtDzlEUscekd8C60v6gqRlytemkt5jez7wK+D4Usd7gX0X4noiIqKPBmP4OBLYE5hL9SZ5aSdtrqRaur8HmAicA2D7CqoNo5dImgPcD2zfy/mPAR4EbitjXA9s0Iva76PadzGz1LKU7b8AewM/pFoh2RHY0fartl+l2iC7H/A81f6QX3UMaLudao/G+HL+wdKWnvp2pbsxF8KNZZz/BU62fW053srPs6s6H6MKIMdI+nLZg/Mpqj04TwBPUd3jjo2yB1M9DnoKOI9q70lERNRMb90OEBGdaWtrc3t7+0CXERExqEiabLut+fhgXPmIiIiIQSzhoxOq/tjXvE6+9hro2vqLpOO6uMZrBrq2iIh4exuMn3ZZ5Gxv1HOrwc32d6j+RklEREStsvIRERERtUr4iIiIiFolfEREREStEj4iIiKiVgkfERERUauEj4iIiKhVwkdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqNXSA11ADH6SjgfWtb13F+f3Ava1/alFNL+B9Ww/uKjmvm/6bMYcO3FhhqjF1BPHDnQJERE9yspH9CtJYyRZ0hvB1vZFiyp49GQg546IiM4lfEREREStEj6WYJKmSjpK0r2SXpB0jqR3SrpG0lxJ10taRdI2kh7vpO8nOhn2D+W/syTNk7SlpP0k3dxCPRtJuk7STEkzJB1Xjm8m6VZJsyQ9KWm8pGWbun9G0sOSnpX0PUlLlb5vmbusyhwk6W+Snpd0hiT16sZFRMRCSfiIXYBPAusDOwLXAMcBq1P9+zikl+N9tPx3ZdtDbd/aSidJw4Drgf8BRgDrAv9bTs8HDi81bQlsC3y1aYjPAm3AB4GdgHHdTLcDsCnwT8DuwHZd1HSApHZJ7fNfnN3KZURERAsSPuKHtmfYng7cBNxu+27brwBXAB+oqY4dgKdsn2L7Zdtzbd8OYHuy7dtsv257KnAmsHVT/5Nsz7T9GHAasEc3c51oe1Zp+3tgk84a2T7LdpvttiErDl/Y64uIiCKfdokZDa9f6uT7oTXVMQp4qLMTktYHvk+1srEi1b/byU3NpjW8fpRq9aQrTzW8fpH6rjEiIkj4iNa8QPWmD4CkIcAaXbR1H+eYRterFT8G7gb2sD1X0mHArk1tRgFTyuu1gSf6WEenNh45nPZ8jDUiol/ksUu04q/A8pLGSloG+A9guS7aPgMsAP6xl3P8FlhL0mGSlpM0TNLm5dwwYA4wT9KGwFc66X9U2Rw7CjgUuLSX80dERE0SPqJHtmdTbfD8CTCdaiXk8S7avgicAPyxfDplixbnmEu18XVHqscifwM+Vk4fCewJzAXOpvNgcSXVo5h7gInAOa3MGxER9ZPd11XyiCVHW1ub29vbB7qMiIhBRdJk223Nx7PyEREREbXKhtOojaStqP6OyN+xnU+cREQsIRI+oja2byIfa42IWOLlsUtERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj4iIiKiVksPdAERg8F902cz5tiJA11Gl6aeOHagS4iIaFlWPuJtT9I2kh4f6DoiIqKS8BERERG1SviItxVJeZQYEbGYS/hYhCRNlXSkpHslzZZ0qaTlJa0i6beSnpH0fHn97oZ+kyR9W9ItkuZJ+o2k1SRdJGmOpDsljWlov6Gk6yTNlPQXSbu3UNt5ks6QNFHSXEm3S1qnnBsjyY1v5KWmL5fX+0n6o6RTJc2S9LCkD5fj0yQ9LWnfHub/h9J3qfL9TyQ93XD+QkmHldcjJF1Vru9BSfs3tDte0uWl/RxgP0krlOt7XtKfgU2b5j5G0vRy3X+RtG0XNR4gqV1S+/wXZ/d0SyMiokUJH4ve7sCngX8A3g/sR3XfzwVGA2sDLwHjm/p9HvgCMBJYB7i19FkVeAD4BoCklYDrgIuBNYE9gB9J2qiF2vYAvgmsAjwInNCL69ocuBdYrcx9CdWb/LrA3sB4SUO76mz7EWAO8IFyaCtgnqT3lO8/CtxYXv8ceBwYAewKfKcpMOwEXA6sDFxEdW/WKV/bAW8EIUkbAAcDm9oeVs5P7aLGs2y32W4bsuLwHm5HRES0KuFj0fuB7SdszwR+A2xi+znbv7T9ou25VG/6Wzf1O9f2Q7ZnA9cAD9m+3vbrwC948017B2Cq7XNtv277LuCXVG/SPfmV7TvKmBcBm/Tiuh4pc84HLgVGAd+y/Yrta4FXqYJId24Etpa0Vvn+8vL9PwDvAP4kaRTwEeAY2y/bvgf4CVUw63Cr7V/bXmD7JarAd4LtmbanAT9oaDsfWA54r6RlbE+1/VAvrjsiIhZSwsei91TD6xeBoZJWlHSmpEfLo4I/ACtLGtLQdkbD65c6+b5jVWE0sHl5hDFL0ixgL2AtevZ3tbV2SZ3Wh+2uauzKjcA2VKscfwAmUYWwrYGbbC+gWu2YWUJah0epVoQ6TGsad0TTsUc7Xth+EDgMOB54WtIlkkb0UGdERPSjbM4bGEcAGwCb235K0ibA3YD6MNY04Ebbn+zH+l4o/12R6tEItBZmeutG4HtUj1RuBG4GJgAv8+YjlyeAVSUNawggawPTG8Zx07hPUq3ETGlo/2Zj+2LgYknvAM4ETuKtKyl/Z+ORw2nP39KIiOgXWfkYGMOoVgZmSVqVsn+jj34LrC/pC5KWKV+bNuyd6DXbz1C9ue8taYikcVT7J/qV7b9R3Ye9gT/YnkO1orILJXyUxya3AN8tm3XfD3yJ6jFRVy4D/r1s7H038G8dJyRtIOnjkpajCjkvUT2KiYiImiR8DIzTgBWAZ4HbgP/p60BlNeBTVBtUn6B6lHIS1b6GhbE/cBTwHLARVQBYFG4EnrP9WMP3oloJ6rAHMIbq+q4AvmH7um7G/CbVo5ZHgGuBCxrOLQecSHXvn6LapHvcQl9FRES0THbzinVENGtra3N7e/tAlxERMahImmy7rfl4Vj4iIiKiVgkfb2OSppQ/Utb8tdeSVENERCxe8mmXtzHbrfyhsbd9DRERsXjJykdERETUKuEjIiIiapXwEREREbVK+IiIiIhaJXxERERErRI+IiIiolYJHxEREVGrhI+IiIioVcJHRERE1CrhIyIiImqV8BERERG1SviIiIiIWiV8RERERK0SPiIiIqJWCR8RERFRq4SPiIiIqFXCR0RERNQq4SMiIiJqlfARERERtUr4iIiIiFolfEREREStEj5iUJG0gaS7Jc2VdMhA1xMREb239EAXENFLRwOTbH+gzknvmz6bMcdOrHNKAKaeOLb2OSMiFrWsfMRgMxqY0ttOkhK0IyIWEwkfMWhIugH4GDBe0jxJh5ZHMHMkTZN0fEPbMZIs6UuSHgNuKMe3kHSLpFmS/iRpmwG5mIiIJVjCRwwatj8O3AQcbHso8CdgH2BlYCzwFUk7N3XbGngPsJ2kkcBE4NvAqsCRwC8lrVHTJUREBAkfMYjZnmT7PtsLbN8L/JwqbDQ63vYLtl8C9gautn116XMd0A58prPxJR0gqV1S+/wXZy/Sa4mIWJIkfMSgJWlzSb+X9Iyk2cBBwOpNzaY1vB4N7FYeucySNAv4CPCuzsa3fZbtNtttQ1YcvkiuISJiSZTwEYPZxcBVwCjbw4EJgJrauOH1NOAC2ys3fK1k+8Sa6o2ICPJR2xjchgEzbb8saTNgT+DabtpfCNwpaTvgemAZYAvgQduPdzfRxiP/f3v3H2RnVd9x/P0xsSKEBIHI8EOCFaUVKEy7yrRWygiOoNYfbbFUOihVqeM4rbV2RGf81QGhVlqxdYpQsVir/FKcKvhjphUtjDIsalVARDSRHwmGQEIClUr49o/77PSy7G52k3vP3Uver5ln5u55znOec+7Jbj45z7mbFUz6sVdJGghXPjTO3gT8dZLNwLuBS+eqXFW3Ay8H3gmsp7cS8lf4fSBJTbnyobFSVcf0vb4cuHyWeqt57CMYquo6HrspVZLUkP/ikyRJTRk+JElSU4YPSZLUlOFDkiQ1ZfiQJElNGT4kSVJThg9JktSU4UOSJDVl+JAkSU0ZPiRJUlOGD0mS1JThQ5IkNWX4kCRJTRk+JElSU4YPSZLUlOFDkiQ1ZfiQJElNGT4kSVJThg9JktSU4UOSJDVl+JAkSU0ZPiRJUlOGD0mS1JThQ5IkNWX4kCRJTRk+JElSU0tH3QFpHHzvzk0cdPqVze63+uyXNLuXJLXmyockSWrK8CFJkpoyfGgsJTk9yW1JNie5Kckru/IlSc5Jck+SnyR5c5JKsrQ7vyLJx5KsTXJnkjOSLBntaCRp5+KeD42r24DnA+uAE4FPJjkYeDlwAnAk8ABw2bTrLgLuBg4GdgO+ANwOfHT6DZKcBpwGsGT5yqEMQpJ2Rq58aCxV1WVVdVdVPVJVlwC3As8FXgWcW1V3VNV9wNlT1yTZh14weUtVPVBVPwP+HjhplnucX1UTVTWxZNcVQx+TJO0sXPnQWEpyCvBW4KCuaBmwN7AfvZWMKf2vVwFPBNYmmSp7wrQ6kqQhM3xo7CRZBVwAHAt8o6q2JvkOEGAtcEBf9af1vb4deAjYu6oebtVfSdKjGT40jnYDClgPkORU4LDu3KXAnye5kt6ej7dPXVRVa5N8BTgnybuALcDTgQOq6mtz3fDw/Vcw6e/ekKSBcM+Hxk5V3QScA3yD3ubRw4Fru9MXAF8Bvgt8G7gKeBjY2p0/Bfgl4CbgPuByYN9WfZckQapq1H2QhibJCcB5VbVqR9qZmJioycnJAfVKknYOSW6oqonp5a586HElyZOTvDjJ0iT7A+8Brhh1vyRJ/8/wocebAO+j90jl28DNwLtH2iNJ0qO44VSPK1X1IPCcUfdDkjQ7Vz4kSVJThg9JktSU4UOSJDVl+JAkSU0ZPiRJUlOGD0mS1JThQ5IkNWX4kCRJTRk+JElSU4YPSZLUlOFDkiQ1ZfiQJElNGT4kSVJThg9JktSU4UOSJDVl+JAkSU0ZPiRJUlOGD0mS1JThQ5IkNWX4kCRJTRk+JElSU4YPSZLUlOFDkiQ1ZfjQwCS5MckxQ2j3X5KcMeh2JUmjsXTUHdDjR1UdOuo+SJIWP1c+JElSU4YPDUyS1UmOS/LcJJNJ7k9yd5K/m8e1lyVZl2RTkq8nmXUVJckbkvwoyb1J/j3Jfn3nKskbk9ya5L4kH0mSvvN/kuTm7tyXk6ya4z6ndeOYXL9+/ULeCknSHAwfGoZzgXOrajnwDODSeVzzReCZwFOBbwH/NlOlJC8AzgJeBewLrAEunlbtpcBzgCO6ei/qrn0F8E7g94CVwH8Bn56tQ1V1flVNVNXEypUr5zEESdJ8GD40DL8ADk6yd1VtqapvbuuCqrqwqjZX1UPAe4EjkqyYoerJwIVV9a2u7juA30xyUF+ds6tqY1X9FPgqcGRX/qfAWVV1c1U9DLwfOHKu1Q9J0uAZPjQMrwOeBfwgyfVJXjpX5SRLkpyd5LYk9wOru1N7z1B9P3qrHQBU1RZgA7B/X511fa8fBJZ1r1cB5ybZmGQjcC+QaddKkobMT7to4KrqVuCPkjyB3iOOy5PsVVUPzHLJq4GXA8fRCx4rgPvoBYPp7qIXIgBIshuwF3DnPLp2O3BmVc34SEeS1IYrHxq4JH+cZGVVPQJs7Iq3znHJ7sBD9FYwdqX3OGQ2nwJOTXJkkid1da+rqtXz6Np5wDumNrMmWZHkxHlcJ0kaIMOHhuF44MYkW+htPj2pqn4+R/1P0HuUcidwEzDrHpGq+g/gXcBngLX0NrSeNJ9OVdUVwN8AF3ePd74PnDCfayVJg5OqGnUfpEVvYmKiJicnR90NSRorSW6oqonp5a58SJKkpgwfaiLJyUm2zHDcOOq+SZLa8tMuaqL7hImfMpEkufIhSZLaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKYMH5IkqSnDhyRJasrwIUmSmjJ8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKYMH5IkqSnDhyRJasrwIUmSmjJ8SJKkpgwfkiSpKcOH5pRkdZLjRt2P2SR5bZJr5jj/xSSvadknSdLclo66A9IwVdUJo+6DJOnRXPmQJElNGT40H0cm+W6STUkuSbJLkqck+UKS9Unu614fMHVB9zjkx0k2J/lJkpO3dZMkb0hyc3fNTUl+vSs/PcltfeWvfOyl+Yeufz9IcmzfiauTvL6vT9ck+WDX558kmXVlJMlpSSaTTK5fv37Bb5okaWaGD83Hq4DjgacDvwa8lt6fnY8Dq4ADgf8B/hEgyW7Ah4ETqmp34LeA78x1gyQnAu8FTgGWAy8DNnSnbwOeD6wA3gd8Msm+fZcfBfwY2Bt4D/DZJHvOcqujgFu6uh8APpYkM1WsqvOraqKqJlauXDlX9yVJC2D40Hx8uKruqqp7gc8DR1bVhqr6TFU9WFWbgTOB3+m75hHgsCRPrqq1VXXjNu7xeuADVXV99fyoqtYAVNVl3f0fqapLgFuB5/Zd+zPgQ1X1i+78LcBLZrnPmqq6oKq2AhcB+wL7LOztkCTtCMOH5mNd3+sHgWVJdk3y0SRrktwPfB3YI8mSqnoA+EPgjcDaJFcm+ZVt3ONp9FY4HiPJKUm+k2Rjko3AYfRWLqbcWVXV9/UaYL9tjaWqHuxeLttG3yRJA2T40Pb6S+AQ4KiqWg4c3ZUHoKq+XFUvpLey8APggm20dzvwjOmFSVZ1174Z2Kuq9gC+P3Wfzv7THp0cCNy14BFJkpowfGh77U5vn8fGbn/Fe6ZOJNknycu6vR8PAVuArdto75+BtyX5jfQc3AWP3YAC1ndtn0pv5aPfU4E/S/LEbu/IrwJX7fgQJUnDYPjQ9voQ8GTgHuCbwJf6zj2B3srIXcC99PaCvGmuxqrqMnr7Rj4FbAY+B+xZVTcB5wDfAO4GDgeunXb5dcAzu76cCfxBVW1AkrQo5dGPyiXNZGJioiYnJ0fdDUkaK0luqKqJ6eWufEiSpKYMH2omyXlJtsxwnDfqvkmS2vH/dlEzVfVGeh+/lSTtxFz5kCRJTRk+JElSU4YPSZLUlOFDkiQ1ZfiQJElNGT4kSVJThg9JktSU4UOSJDVl+JAkSU0ZPiRJUlOGD0mS1JThQ5IkNWX4kCRJTRk+JElSU4YPSZLUVKpq1H2QFr0km4FbRt2PAdkbuGfUnRgQx7I4mM9dzAAAA/JJREFUOZbFaRRjWVVVK6cXLm3cCWlc3VJVE6PuxCAkmXQsi49jWZwcy3D42EWSJDVl+JAkSU0ZPqT5OX/UHRggx7I4OZbFybEMgRtOJUlSU658SJKkpgwfkiSpKcOHdkpJ9kxyRZIHkqxJ8uo56v5FknVJNiW5MMmTtqedYRngWK5O8vMkW7qj+e81me9YkhyW5MtJ7knymGfH4zQv8xjLOM3La5LckOT+JHck+UCSpQttZ5gGOJZxmpeTktzSfd//LMlFSZYvtJ1BMnxoZ/UR4H+BfYCTgX9Kcuj0SkleBJwOHAscBPwy8L6FtjNkgxoLwJurall3HDLUXs9svu/nL4BLgdftYDvDNKixwPjMy67AW+j9Mquj6P1Ze9t2tDNMgxoLjM+8XAs8r6pW0Pu+XwqcsR3tDE5VeXjsVAewW/eN9qy+sn8Fzp6h7qeA9/d9fSywbqHtLPaxdF9fDbx+HOal7/zBvR9jO9bOYh3LuM5LX723Ap8f53mZaSzjPC/AMuATwFWjnBdXPrQzehawtap+2Ff238BMSf/Q7lx/vX2S7LXAdoZlUGOZcla3/H9tkmMG3tu5Der9HLd5mY9xnZejgRsH0M6gDGosU8ZmXpL8dpJNwGbg94EPbU87g2L40M5oGbBpWtkmYPd51J16vfsC2xmWQY0F4O30lmT3p/f7AD6f5BmD6+o2Der9HLd52ZaxnJckpwITwAd3pJ0BG9RYYMzmpaquqd5jlwOAvwVWb087g2L40M5oC7B8Wtlyev8i2FbdqdebF9jOsAxqLFTVdVW1uaoeqqqL6D0nfvGA+zuXQb2f4zYvcxrHeUnyCuBs4ISqmvqPzMZyXmYZy1jOC0BV3Ql8Cbh4R9rZUYYP7Yx+CCxN8sy+siN47JIqXdkR0+rdXVUbFtjOsAxqLDMpIAPp5fwM6v0ct3lZqEU9L0mOBy4Afreqvre97QzJoMYyk0U9L9MsBaZWaUYzL6PaLOPhMcqDXur/NL3NVs+jt8x46Az1jgfWAc8GngL8J30bsebbzmIfC7AH8CJgF3o/mE4GHgAOWaRjSdfXZ9P7ob8L8KQxnZdZxzKG8/ICYANw9I60s9jHMobzcjJwYPdnbRXwNeCzo5yXZm+Sh8diOoA9gc91PzB+Cry6Kz+Q3jLkgX113wrcDdwPfHzaX3IztjNuYwFWAtfTW2rdCHwTeOFiHQu9jwrXtGP1OM7LXGMZw3n5KvBwVzZ1fHFM52XWsYzhvJwJ3NHVu4PeHpW9Rjkv/t8ukiSpKfd8SJKkpgwfkiSpKcOHJElqyvAhSZKaMnxIkqSmDB+SJKkpw4ckSWrK8CFJkpoyfEiSpKb+D9y49hfY1z6JAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x648 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(6, 9))\n",
    "\n",
    "ind = np.argsort(xgb_model.feature_importances_)[::-1]\n",
    "features_sorted = np.array(features)[ind]\n",
    "importances_sorted = xgb_model.feature_importances_[ind]\n",
    "\n",
    "plt.barh(y=range(len(features)), width=importances_sorted, height=0.2)\n",
    "plt.title('Gain')\n",
    "plt.yticks(ticks=range(len(features)), labels=features_sorted)\n",
    "plt.gca().invert_yaxis()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling (part 2): Linear models & Ensembles\n",
    "\n",
    "Given the randomness of the _Titanic dataset_ , we can be satisfied with the performance of `xgboost` model above. Still, it is always usefull to try a variety of models and approaches, especially since `vaex` makes makes this process rather simple. \n",
    "\n",
    "In the following part we will use a couple of linear models as our predictors, this time straight from `scikit-learn`. This requires us to pre-process the data in a slightly different way.\n",
    "\n",
    "### Feature pre-processing for linear models\n",
    "\n",
    "When using linear models, the safest option is to encode categorical variables with the one-hot encoding scheme, especially if they have low cardinality. We will do this for the \"family_size\" and \"deck\" features. Note that the \"sex\" feature is already encoded since it has only unique values options. \n",
    "\n",
    "The \"name_title\" feature is a bit more tricky. Since in its original form it has some values that only appear a couple of times, we will do a trick: we will one-hot encode the frequency encoded values. This will reduce cardinality of the feature, while also preserving the most important, i.e. most common values.\n",
    "\n",
    "Regarding the \"age\" and \"fare\", to add some variance in the model, we will not convert them to categorical as before, but simply remove their mean and standard-deviations (standard-scaling). We will do the same to the \"fare_per_family_member\" feature.\n",
    "\n",
    "\n",
    "Finally, we will drop out any other features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:39.432673Z",
     "start_time": "2020-01-14T15:31:39.421647Z"
    }
   },
   "outputs": [],
   "source": [
    "# One-hot encode categorical features\n",
    "one_hot = vaex.ml.OneHotEncoder(features=['deck', 'family_size', 'frequency_encoded_name_title'])\n",
    "df_train = one_hot.fit_transform(df_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:39.491653Z",
     "start_time": "2020-01-14T15:31:39.434514Z"
    }
   },
   "outputs": [],
   "source": [
    "# Standard scale numerical features\n",
    "standard_scaler = vaex.ml.StandardScaler(features=['age', 'fare', 'fare_per_family_member'])\n",
    "df_train = standard_scaler.fit_transform(df_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:39.497528Z",
     "start_time": "2020-01-14T15:31:39.492993Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['deck_A',\n",
       " 'deck_B',\n",
       " 'deck_C',\n",
       " 'deck_D',\n",
       " 'deck_E',\n",
       " 'deck_F',\n",
       " 'deck_G',\n",
       " 'deck_M',\n",
       " 'family_size_1',\n",
       " 'family_size_2',\n",
       " 'family_size_3',\n",
       " 'family_size_4',\n",
       " 'family_size_5',\n",
       " 'family_size_6',\n",
       " 'family_size_7',\n",
       " 'family_size_8',\n",
       " 'family_size_11',\n",
       " 'frequency_encoded_name_title_0_0009551098376313276',\n",
       " 'frequency_encoded_name_title_0_0019102196752626551',\n",
       " 'frequency_encoded_name_title_0_0028653295128939827',\n",
       " 'frequency_encoded_name_title_0_0057306590257879654',\n",
       " 'frequency_encoded_name_title_0_007640878701050621',\n",
       " 'frequency_encoded_name_title_0_045845272206303724',\n",
       " 'frequency_encoded_name_title_0_1451766953199618',\n",
       " 'frequency_encoded_name_title_0_20152817574021012',\n",
       " 'frequency_encoded_name_title_0_5787965616045845',\n",
       " 'standard_scaled_age',\n",
       " 'standard_scaled_fare',\n",
       " 'standard_scaled_fare_per_family_member',\n",
       " 'label_encoded_sex']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the features for training a linear model\n",
    "features_linear = df_train.get_column_names(regex='^deck_|^family_size_|^frequency_encoded_name_title_')\n",
    "features_linear += df_train.get_column_names(regex='^standard_scaled_')\n",
    "features_linear += ['label_encoded_sex']\n",
    "features_linear"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Estimators: `SVC` and `LogisticRegression`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:39.545358Z",
     "start_time": "2020-01-14T15:31:39.498965Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.svm import SVC\n",
    "from sklearn.linear_model import LogisticRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:47.010890Z",
     "start_time": "2020-01-14T15:31:46.467319Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                            </th><th>sex   </th><th style=\"text-align: right;\">  age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket   </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest           </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th>prediction_xgb  </th><th style=\"text-align: right;\">  deck_A</th><th style=\"text-align: right;\">  deck_B</th><th style=\"text-align: right;\">  deck_C</th><th style=\"text-align: right;\">  deck_D</th><th style=\"text-align: right;\">  deck_E</th><th style=\"text-align: right;\">  deck_F</th><th style=\"text-align: right;\">  deck_G</th><th style=\"text-align: right;\">  deck_M</th><th style=\"text-align: right;\">  family_size_1</th><th style=\"text-align: right;\">  family_size_2</th><th style=\"text-align: right;\">  family_size_3</th><th style=\"text-align: right;\">  family_size_4</th><th style=\"text-align: right;\">  family_size_5</th><th style=\"text-align: right;\">  family_size_6</th><th style=\"text-align: right;\">  family_size_7</th><th style=\"text-align: right;\">  family_size_8</th><th style=\"text-align: right;\">  family_size_11</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0009551098376313276</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0019102196752626551</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0028653295128939827</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0057306590257879654</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_007640878701050621</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_045845272206303724</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_1451766953199618</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_20152817574021012</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_5787965616045845</th><th style=\"text-align: right;\">  standard_scaled_age</th><th style=\"text-align: right;\">  standard_scaled_fare</th><th style=\"text-align: right;\">  standard_scaled_fare_per_family_member</th><th>prediction_svc  </th><th>prediction_lr  </th><th>prediction_final  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Stoytcheff, Mr. Ilia            </td><td>male  </td><td style=\"text-align: right;\">   19</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>349205   </td><td style=\"text-align: right;\"> 7.8958</td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               57</td><td style=\"text-align: right;\">                  7.8958</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">            -0.807704</td><td style=\"text-align: right;\">             -0.493719</td><td style=\"text-align: right;\">                               -0.342804</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Payne, Mr. Vivian Ponsonby      </td><td>male  </td><td style=\"text-align: right;\">   23</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>12749    </td><td style=\"text-align: right;\">93.5   </td><td>B24    </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>Montreal, PQ        </td><td>Mr          </td><td style=\"text-align: right;\">               4</td><td>B     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               23</td><td style=\"text-align: right;\">                 93.5   </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   1</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">            -0.492921</td><td style=\"text-align: right;\">              1.19613 </td><td style=\"text-align: right;\">                                1.99718 </td><td>False           </td><td>True           </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       3</td><td>True      </td><td>Abbott, Mrs. Stanton (Rosa Hunt)</td><td>female</td><td style=\"text-align: right;\">   35</td><td style=\"text-align: right;\">      1</td><td style=\"text-align: right;\">      1</td><td>C.A. 2673</td><td style=\"text-align: right;\">20.25  </td><td>M      </td><td>S         </td><td>A     </td><td style=\"text-align: right;\">   nan</td><td>East Providence, RI </td><td>Mrs         </td><td style=\"text-align: right;\">               5</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            3</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">              105</td><td style=\"text-align: right;\">                  6.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">             0.45143 </td><td style=\"text-align: right;\">             -0.249845</td><td style=\"text-align: right;\">                               -0.374124</td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Hocking, Miss. Ellen \"Nellie\"   </td><td>female</td><td style=\"text-align: right;\">   20</td><td style=\"text-align: right;\">      2</td><td style=\"text-align: right;\">      1</td><td>29105    </td><td style=\"text-align: right;\">23     </td><td>M      </td><td>S         </td><td>4     </td><td style=\"text-align: right;\">   nan</td><td>Cornwall / Akron, OH</td><td>Miss        </td><td style=\"text-align: right;\">               4</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            4</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               40</td><td style=\"text-align: right;\">                  5.75  </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.201528</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 1</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">            -0.729008</td><td style=\"text-align: right;\">             -0.195559</td><td style=\"text-align: right;\">                               -0.401459</td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Nilsson, Mr. August Ferdinand   </td><td>male  </td><td style=\"text-align: right;\">   21</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>350410   </td><td style=\"text-align: right;\"> 7.8542</td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                </td><td>Mr          </td><td style=\"text-align: right;\">               4</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">               63</td><td style=\"text-align: right;\">                  7.8542</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">            -0.650312</td><td style=\"text-align: right;\">             -0.494541</td><td style=\"text-align: right;\">                               -0.343941</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                              sex       age    sibsp    parch  ticket        fare  cabin    embarked    boat      body  home_dest             name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title  prediction_xgb      deck_A    deck_B    deck_C    deck_D    deck_E    deck_F    deck_G    deck_M    family_size_1    family_size_2    family_size_3    family_size_4    family_size_5    family_size_6    family_size_7    family_size_8    family_size_11    frequency_encoded_name_title_0_0009551098376313276    frequency_encoded_name_title_0_0019102196752626551    frequency_encoded_name_title_0_0028653295128939827    frequency_encoded_name_title_0_0057306590257879654    frequency_encoded_name_title_0_007640878701050621    frequency_encoded_name_title_0_045845272206303724    frequency_encoded_name_title_0_1451766953199618    frequency_encoded_name_title_0_20152817574021012    frequency_encoded_name_title_0_5787965616045845    standard_scaled_age    standard_scaled_fare    standard_scaled_fare_per_family_member  prediction_svc    prediction_lr    prediction_final\n",
       "  0         3  False       Stoytcheff, Mr. Ilia              male       19        0        0  349205      7.8958  M        S           None       nan  None                  Mr                           3  M                   0            1              1           0                 57                    7.8958                    0                         0                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   0                                                  1              -0.807704               -0.493719                                 -0.342804  False             False            False\n",
       "  1         1  False       Payne, Mr. Vivian Ponsonby        male       23        0        0  12749      93.5     B24      S           None       nan  Montreal, PQ          Mr                           4  B                   0            1              1           0                 23                   93.5                       0                         0                     1                        0.578797  False                    0         1         0         0         0         0         0         0                1                0                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   0                                                  1              -0.492921                1.19613                                   1.99718   False             True             False\n",
       "  2         3  True        Abbott, Mrs. Stanton (Rosa Hunt)  female     35        1        1  C.A. 2673  20.25    M        S           A          nan  East Providence, RI   Mrs                          5  M                   0            1              3           0                105                    6.75                      1                         0                     0                        0.145177  True                     0         0         0         0         0         0         0         1                0                0                1                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  1                                                   0                                                  0               0.45143                -0.249845                                 -0.374124  True              True             True\n",
       "  3         2  True        Hocking, Miss. Ellen \"Nellie\"     female     20        2        1  29105      23       M        S           4          nan  Cornwall / Akron, OH  Miss                         4  M                   0            1              4           0                 40                    5.75                      1                         0                     0                        0.201528  True                     0         0         0         0         0         0         0         1                0                0                0                1                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   1                                                  0              -0.729008               -0.195559                                 -0.401459  True              True             True\n",
       "  4         3  False       Nilsson, Mr. August Ferdinand     male       21        0        0  350410      7.8542  M        S           None       nan  None                  Mr                           4  M                   0            1              1           0                 63                    7.8542                    0                         0                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   0                                                  1              -0.650312               -0.494541                                 -0.343941  False             False            False"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The Support Vector Classifier\n",
    "vaex_svc = vaex.ml.sklearn.Predictor(features=features_linear, \n",
    "                                     target='survived',\n",
    "                                     model=SVC(max_iter=1000, random_state=42),\n",
    "                                     prediction_name='prediction_svc')\n",
    "\n",
    "# Logistic Regression\n",
    "vaex_logistic = vaex.ml.sklearn.Predictor(features=features_linear, \n",
    "                                          target='survived',\n",
    "                                          model=LogisticRegression(max_iter=1000, random_state=42),\n",
    "                                          prediction_name='prediction_lr')\n",
    "\n",
    "# Train the new models and apply the transformation to the train dataframe\n",
    "for model in [vaex_svc, vaex_logistic]:\n",
    "    model.fit(df_train)\n",
    "    df_train = model.transform(df_train)\n",
    "    \n",
    "# Preview of the train DataFrame\n",
    "df_train.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ensemble\n",
    "\n",
    "Just as before, the predictions from the `SVC` and the `LogisticRegression` classifiers are added as virtual columns in the training dataset. This is quite powerful, since now we can easily use them to create an ensemble! For example, let's do a weighted mean."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:40.519589Z",
     "start_time": "2020-01-14T15:31:40.014057Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                                </th><th>prediction_xgb  </th><th>prediction_svc  </th><th>prediction_lr  </th><th>prediction_final  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>    </td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>    </td><td>False           </td><td>False           </td><td>True           </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>    </td><td>True            </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>    </td><td>True            </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>    </td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td>...                              </td><td>...             </td><td>...             </td><td>...            </td><td>...               </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,042</i></td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,043</i></td><td>False           </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,044</i></td><td>True            </td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,045</i></td><td>False           </td><td>False           </td><td>True           </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1,046</i></td><td>False           </td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#      prediction_xgb    prediction_svc    prediction_lr    prediction_final\n",
       "0      False             False             False            False\n",
       "1      False             False             True             False\n",
       "2      True              True              True             True\n",
       "3      True              True              True             True\n",
       "4      False             False             False            False\n",
       "...    ...               ...               ...              ...\n",
       "1,042  False             False             False            False\n",
       "1,043  False             True              True             True\n",
       "1,044  True              True              True             True\n",
       "1,045  False             False             True             False\n",
       "1,046  False             False             False            False"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Weighed mean of the classes\n",
    "prediction_final = (df_train.prediction_xgb.astype('int') * 0.3 + \n",
    "                    df_train.prediction_svc.astype('int') * 0.5 + \n",
    "                    df_train.prediction_xgb.astype('int') * 0.2)\n",
    "# Get the predicted class\n",
    "prediction_final = (prediction_final >= 0.5)\n",
    "# Add the expression to the train DataFrame\n",
    "df_train['prediction_final'] = prediction_final\n",
    "\n",
    "# Preview\n",
    "df_train[df_train.get_column_names(regex='^predict')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance (part 2)\n",
    "\n",
    "Applying the ensembler to the test set is just as easy as before. We just need to get the new state of the training DataFrame, and transfer it to the test DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:40.939434Z",
     "start_time": "2020-01-14T15:31:40.521224Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                            </th><th style=\"text-align: right;\">  pclass</th><th>survived  </th><th>name                                        </th><th>sex   </th><th style=\"text-align: right;\">   age</th><th style=\"text-align: right;\">  sibsp</th><th style=\"text-align: right;\">  parch</th><th>ticket          </th><th style=\"text-align: right;\">   fare</th><th>cabin  </th><th>embarked  </th><th>boat  </th><th style=\"text-align: right;\">  body</th><th>home_dest               </th><th>name_title  </th><th style=\"text-align: right;\">  name_num_words</th><th>deck  </th><th style=\"text-align: right;\">  multi_cabin</th><th style=\"text-align: right;\">  has_cabin</th><th style=\"text-align: right;\">  family_size</th><th style=\"text-align: right;\">  is_alone</th><th style=\"text-align: right;\">  age_times_class</th><th style=\"text-align: right;\">  fare_per_family_member</th><th style=\"text-align: right;\">  label_encoded_sex</th><th style=\"text-align: right;\">  label_encoded_embarked</th><th style=\"text-align: right;\">  label_encoded_deck</th><th style=\"text-align: right;\">  frequency_encoded_name_title</th><th>prediction_xgb  </th><th style=\"text-align: right;\">  deck_A</th><th style=\"text-align: right;\">  deck_B</th><th style=\"text-align: right;\">  deck_C</th><th style=\"text-align: right;\">  deck_D</th><th style=\"text-align: right;\">  deck_E</th><th style=\"text-align: right;\">  deck_F</th><th style=\"text-align: right;\">  deck_G</th><th style=\"text-align: right;\">  deck_M</th><th style=\"text-align: right;\">  family_size_1</th><th style=\"text-align: right;\">  family_size_2</th><th style=\"text-align: right;\">  family_size_3</th><th style=\"text-align: right;\">  family_size_4</th><th style=\"text-align: right;\">  family_size_5</th><th style=\"text-align: right;\">  family_size_6</th><th style=\"text-align: right;\">  family_size_7</th><th style=\"text-align: right;\">  family_size_8</th><th style=\"text-align: right;\">  family_size_11</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0009551098376313276</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0019102196752626551</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0028653295128939827</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_0057306590257879654</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_007640878701050621</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_045845272206303724</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_1451766953199618</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_20152817574021012</th><th style=\"text-align: right;\">  frequency_encoded_name_title_0_5787965616045845</th><th style=\"text-align: right;\">  standard_scaled_age</th><th style=\"text-align: right;\">  standard_scaled_fare</th><th style=\"text-align: right;\">  standard_scaled_fare_per_family_member</th><th>prediction_svc  </th><th>prediction_lr  </th><th>prediction_final  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>O'Connor, Mr. Patrick                       </td><td>male  </td><td style=\"text-align: right;\">28.032</td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>366713          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           84.096</td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">           -0.096924 </td><td style=\"text-align: right;\">             -0.496597</td><td style=\"text-align: right;\">                               -0.346789</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Canavan, Mr. Patrick                        </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>364858          </td><td style=\"text-align: right;\"> 7.75  </td><td>M      </td><td>Q         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>Ireland Philadelphia, PA</td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.75  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       1</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">           -0.650312 </td><td style=\"text-align: right;\">             -0.496597</td><td style=\"text-align: right;\">                               -0.346789</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i></td><td style=\"text-align: right;\">       1</td><td>False     </td><td>Ovies y Rodriguez, Mr. Servando             </td><td>male  </td><td style=\"text-align: right;\">28.5  </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>PC 17562        </td><td style=\"text-align: right;\">27.7208</td><td>D43    </td><td>C         </td><td>None  </td><td style=\"text-align: right;\">   189</td><td>?Havana, Cuba           </td><td>Mr          </td><td style=\"text-align: right;\">               5</td><td>D     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           28.5  </td><td style=\"text-align: right;\">                 27.7208</td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       2</td><td style=\"text-align: right;\">                   4</td><td style=\"text-align: right;\">                      0.578797</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">           -0.0600935</td><td style=\"text-align: right;\">             -0.102369</td><td style=\"text-align: right;\">                                0.19911 </td><td>False           </td><td>False          </td><td>True              </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i></td><td style=\"text-align: right;\">       3</td><td>False     </td><td>Windelov, Mr. Einar                         </td><td>male  </td><td style=\"text-align: right;\">21    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      0</td><td>SOTON/OQ 3101317</td><td style=\"text-align: right;\"> 7.25  </td><td>M      </td><td>S         </td><td>None  </td><td style=\"text-align: right;\">   nan</td><td>None                    </td><td>Mr          </td><td style=\"text-align: right;\">               3</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            1</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           63    </td><td style=\"text-align: right;\">                  7.25  </td><td style=\"text-align: right;\">                  0</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.578797</td><td>False           </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">           -0.650312 </td><td style=\"text-align: right;\">             -0.506468</td><td style=\"text-align: right;\">                               -0.360456</td><td>False           </td><td>False          </td><td>False             </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i></td><td style=\"text-align: right;\">       2</td><td>True      </td><td>Shelley, Mrs. William (Imanita Parrish Hall)</td><td>female</td><td style=\"text-align: right;\">25    </td><td style=\"text-align: right;\">      0</td><td style=\"text-align: right;\">      1</td><td>230433          </td><td style=\"text-align: right;\">26     </td><td>M      </td><td>S         </td><td>12    </td><td style=\"text-align: right;\">   nan</td><td>Deer Lodge, MT          </td><td>Mrs         </td><td style=\"text-align: right;\">               6</td><td>M     </td><td style=\"text-align: right;\">            0</td><td style=\"text-align: right;\">          1</td><td style=\"text-align: right;\">            2</td><td style=\"text-align: right;\">         0</td><td style=\"text-align: right;\">           50    </td><td style=\"text-align: right;\">                 13     </td><td style=\"text-align: right;\">                  1</td><td style=\"text-align: right;\">                       0</td><td style=\"text-align: right;\">                   0</td><td style=\"text-align: right;\">                      0.145177</td><td>True            </td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       0</td><td style=\"text-align: right;\">       1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              1</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">               0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                   0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                  0</td><td style=\"text-align: right;\">                                                1</td><td style=\"text-align: right;\">                                                 0</td><td style=\"text-align: right;\">                                                0</td><td style=\"text-align: right;\">           -0.335529 </td><td style=\"text-align: right;\">             -0.136338</td><td style=\"text-align: right;\">                               -0.203281</td><td>True            </td><td>True           </td><td>True              </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "  #    pclass  survived    name                                          sex        age    sibsp    parch  ticket               fare  cabin    embarked    boat      body  home_dest                 name_title      name_num_words  deck      multi_cabin    has_cabin    family_size    is_alone    age_times_class    fare_per_family_member    label_encoded_sex    label_encoded_embarked    label_encoded_deck    frequency_encoded_name_title  prediction_xgb      deck_A    deck_B    deck_C    deck_D    deck_E    deck_F    deck_G    deck_M    family_size_1    family_size_2    family_size_3    family_size_4    family_size_5    family_size_6    family_size_7    family_size_8    family_size_11    frequency_encoded_name_title_0_0009551098376313276    frequency_encoded_name_title_0_0019102196752626551    frequency_encoded_name_title_0_0028653295128939827    frequency_encoded_name_title_0_0057306590257879654    frequency_encoded_name_title_0_007640878701050621    frequency_encoded_name_title_0_045845272206303724    frequency_encoded_name_title_0_1451766953199618    frequency_encoded_name_title_0_20152817574021012    frequency_encoded_name_title_0_5787965616045845    standard_scaled_age    standard_scaled_fare    standard_scaled_fare_per_family_member  prediction_svc    prediction_lr    prediction_final\n",
       "  0         3  False       O'Connor, Mr. Patrick                         male    28.032        0        0  366713             7.75    M        Q           None       nan  None                      Mr                           3  M                   0            1              1           0             84.096                    7.75                      0                         1                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   0                                                  1             -0.096924                -0.496597                                 -0.346789  False             False            False\n",
       "  1         3  False       Canavan, Mr. Patrick                          male    21            0        0  364858             7.75    M        Q           None       nan  Ireland Philadelphia, PA  Mr                           3  M                   0            1              1           0             63                        7.75                      0                         1                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   0                                                  1             -0.650312                -0.496597                                 -0.346789  False             False            False\n",
       "  2         1  False       Ovies y Rodriguez, Mr. Servando               male    28.5          0        0  PC 17562          27.7208  D43      C           None       189  ?Havana, Cuba             Mr                           5  D                   0            1              1           0             28.5                     27.7208                    0                         2                     4                        0.578797  True                     0         0         0         1         0         0         0         0                1                0                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   0                                                  1             -0.0600935               -0.102369                                  0.19911   False             False            True\n",
       "  3         3  False       Windelov, Mr. Einar                           male    21            0        0  SOTON/OQ 3101317   7.25    M        S           None       nan  None                      Mr                           3  M                   0            1              1           0             63                        7.25                      0                         0                     0                        0.578797  False                    0         0         0         0         0         0         0         1                1                0                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  0                                                   0                                                  1             -0.650312                -0.506468                                 -0.360456  False             False            False\n",
       "  4         2  True        Shelley, Mrs. William (Imanita Parrish Hall)  female  25            0        1  230433            26       M        S           12         nan  Deer Lodge, MT            Mrs                          6  M                   0            1              2           0             50                       13                         1                         0                     0                        0.145177  True                     0         0         0         0         0         0         0         1                0                1                0                0                0                0                0                0                 0                                                     0                                                     0                                                     0                                                     0                                                    0                                                    0                                                  1                                                   0                                                  0             -0.335529                -0.136338                                 -0.203281  True              True             True"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# State transfer\n",
    "state_new = df_train.state_get()\n",
    "df_test.state_set(state_new)\n",
    "\n",
    "# Preview\n",
    "df_test.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's check the performance of all the individual models as well as on the ensembler, on the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T15:31:41.030423Z",
     "start_time": "2020-01-14T15:31:40.941145Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prediction_xgb\n",
      "Accuracy: 0.798\n",
      "f1 score: 0.744\n",
      "roc-auc: 0.785\n",
      " \n",
      "prediction_svc\n",
      "Accuracy: 0.813\n",
      "f1 score: 0.763\n",
      "roc-auc: 0.801\n",
      " \n",
      "prediction_lr\n",
      "Accuracy: 0.794\n",
      "f1 score: 0.743\n",
      "roc-auc: 0.783\n",
      " \n",
      "prediction_final\n",
      "Accuracy: 0.832\n",
      "f1 score: 0.802\n",
      "roc-auc: 0.831\n",
      " \n"
     ]
    }
   ],
   "source": [
    "pred_columns = df_train.get_column_names(regex='^prediction_')\n",
    "for i in pred_columns:\n",
    "    print(i)\n",
    "    binary_metrics(y_true=df_test.survived.values, y_pred=df_test[i].values)\n",
    "    print(' ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that our ensembler is doing a better job than any idividual model, as expected.\n",
    "\n",
    "Thanks you for going over this example. Feel free to copy, modify, and in general play around with this notebook."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
